Efficient Matrix Multiplication and Ranking for Collaborative Filtering The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering

How do I transpose the 1st and -1th levels of an arbitrarily nested array?

Can I equip Skullclamp on a creature I am sacrificing?

Multiple labels for a single equation

Is "for causing autism in X" grammatical?

Non-deterministic sum of floats

How to avoid supervisors with prejudiced views?

Rotate a column

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

Why does standard notation not preserve intervals (visually)

How does the mv command work with external drives?

Why didn't Khan get resurrected in the Genesis Explosion?

Novel about a guy who is possessed by the divine essence and the world ends?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

What was the first Unix version to run on a microcomputer?

Sending manuscript to multiple publishers

Bold, vivid family

Inappropriate reference requests from Journal reviewers

How do we know the LHC results are robust?

What exact does MIB represent in SNMP? How is it different from OID?

What does "Its cash flow is deeply negative" mean?

Would a completely good Muggle be able to use a wand?

Skipping indices in a product

Complex fractions

Is it professional to write unrelated content in an almost-empty email?



Efficient Matrix Multiplication and Ranking for Collaborative Filtering



The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering










0















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
























  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47
















0















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
























  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47














0












0








0








I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?










share|improve this question
















I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k) and Q (#items * k) matrix as .txt files. And I am trying to multiply them to get the estimated R and then rank each row to get the top N recommendations for each customer.



The package I am using doesn't offer a recommend_top_n API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5 file.



Dimension:



P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000) and I rounded the values to the smallest possible type (int16). So the resulted R matrix should be around 150 GB. I am currently using a EC2 with 160 GB RAM (m4.10xlarge). And I am writing the top N results for each user to a table for a dashboard to pick up instead of offering as a web service.



I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.



My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N recommendations directly?







python matrix bigdata matrix-multiplication recommender-systems






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 at 18:08







Tina Bu

















asked Mar 21 at 15:45









Tina BuTina Bu

65




65












  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47


















  • Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

    – sascha
    Mar 21 at 18:47

















Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47






Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill

– sascha
Mar 21 at 18:47













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현