Efficient Matrix Multiplication and Ranking for Collaborative Filtering The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering

How do I transpose the 1st and -1th levels of an arbitrarily nested array?

Can I equip Skullclamp on a creature I am sacrificing?

Multiple labels for a single equation

Is "for causing autism in X" grammatical?

Non-deterministic sum of floats

How to avoid supervisors with prejudiced views?

Rotate a column

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

Why does standard notation not preserve intervals (visually)

How does the mv command work with external drives?

Why didn't Khan get resurrected in the Genesis Explosion?

Novel about a guy who is possessed by the divine essence and the world ends?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

What was the first Unix version to run on a microcomputer?

Sending manuscript to multiple publishers

Bold, vivid family

Inappropriate reference requests from Journal reviewers

How do we know the LHC results are robust?

What exact does MIB represent in SNMP? How is it different from OID?

What does "Its cash flow is deeply negative" mean?

Would a completely good Muggle be able to use a wand?

Skipping indices in a product

Complex fractions

Is it professional to write unrelated content in an almost-empty email?



Efficient Matrix Multiplication and Ranking for Collaborative Filtering



The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering










0















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
























  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47
















0















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
























  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47














0












0








0








I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?







python matrix bigdata matrix-multiplication recommender-systems






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 at 18:08







Tina Bu

















asked Mar 21 at 15:45









Tina BuTina Bu

65




65












  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47


















  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47

















Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47






Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript