Efficient Matrix Multiplication and Ranking for Collaborative Filtering The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering

How do I transpose the 1st and -1th levels of an arbitrarily nested array?

Can I equip Skullclamp on a creature I am sacrificing?

Multiple labels for a single equation

Is "for causing autism in X" grammatical?

Non-deterministic sum of floats

How to avoid supervisors with prejudiced views?

Rotate a column

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

Why does standard notation not preserve intervals (visually)

How does the mv command work with external drives?

Why didn't Khan get resurrected in the Genesis Explosion?

Novel about a guy who is possessed by the divine essence and the world ends?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

What was the first Unix version to run on a microcomputer?

Sending manuscript to multiple publishers

Bold, vivid family

Inappropriate reference requests from Journal reviewers

How do we know the LHC results are robust?

What exact does MIB represent in SNMP? How is it different from OID?

What does "Its cash flow is deeply negative" mean?

Would a completely good Muggle be able to use a wand?

Skipping indices in a product

Complex fractions

Is it professional to write unrelated content in an almost-empty email?

Efficient Matrix Multiplication and Ranking for Collaborative Filtering

The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering

I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.

The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.

Dimension:

P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.

I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.

My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47

add a comment |

Dimension:

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47

add a comment |

Dimension:

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

Dimension:

python matrix bigdata matrix-multiplication recommender-systems

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

edited Mar 21 at 18:08

asked Mar 21 at 15:45

Tina Bu

asked Mar 21 at 15:45

Tina Bu

asked Mar 21 at 15:45

Tina Bu

Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47

add a comment |

Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47

Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴