Efficient Matrix Multiplication and Ranking for Collaborative Filtering The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering
How do I transpose the 1st and -1th levels of an arbitrarily nested array?
Can I equip Skullclamp on a creature I am sacrificing?
Multiple labels for a single equation
Is "for causing autism in X" grammatical?
Non-deterministic sum of floats
How to avoid supervisors with prejudiced views?
Rotate a column
I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin
Why does standard notation not preserve intervals (visually)
How does the mv command work with external drives?
Why didn't Khan get resurrected in the Genesis Explosion?
Novel about a guy who is possessed by the divine essence and the world ends?
Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?
What was the first Unix version to run on a microcomputer?
Sending manuscript to multiple publishers
Bold, vivid family
Inappropriate reference requests from Journal reviewers
How do we know the LHC results are robust?
What exact does MIB represent in SNMP? How is it different from OID?
What does "Its cash flow is deeply negative" mean?
Would a completely good Muggle be able to use a wand?
Skipping indices in a product
Complex fractions
Is it professional to write unrelated content in an almost-empty email?
Efficient Matrix Multiplication and Ranking for Collaborative Filtering
The Next CEO of Stack OverflowHow to return multiple values from a function?list comprehension vs. lambda + filterHow does Python's super() work with multiple inheritance?how does multiplication differ for NumPy Matrix vs Array classes?Why is MATLAB so fast in matrix multiplication?Catch multiple exceptions in one line (except block)I need an efficient shared dictionary in a Python multiprocessing environmentOptimizing numpy matrix operations (currently using a for loop)Numpy dot product MemoryError for small matricesPersonnalized Collaborative Filtering
I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k)
and Q (#items * k)
matrix as .txt
files. And I am trying to multiply them to get the estimated R
and then rank each row to get the top N
recommendations for each customer.
The package I am using doesn't offer a recommend_top_n
API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5
file.
Dimension:
P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000)
and I rounded the values to the smallest possible type (int16
). So the resulted R matrix should be around 150 GB
. I am currently using a EC2 with 160 GB
RAM (m4.10xlarge). And I am writing the top N
results for each user to a table for a dashboard to pick up instead of offering as a web service.
I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.
My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N
recommendations directly?
python matrix bigdata matrix-multiplication recommender-systems
add a comment |
I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k)
and Q (#items * k)
matrix as .txt
files. And I am trying to multiply them to get the estimated R
and then rank each row to get the top N
recommendations for each customer.
The package I am using doesn't offer a recommend_top_n
API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5
file.
Dimension:
P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000)
and I rounded the values to the smallest possible type (int16
). So the resulted R matrix should be around 150 GB
. I am currently using a EC2 with 160 GB
RAM (m4.10xlarge). And I am writing the top N
results for each user to a table for a dashboard to pick up instead of offering as a web service.
I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.
My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N
recommendations directly?
python matrix bigdata matrix-multiplication recommender-systems
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47
add a comment |
I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k)
and Q (#items * k)
matrix as .txt
files. And I am trying to multiply them to get the estimated R
and then rank each row to get the top N
recommendations for each customer.
The package I am using doesn't offer a recommend_top_n
API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5
file.
Dimension:
P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000)
and I rounded the values to the smallest possible type (int16
). So the resulted R matrix should be around 150 GB
. I am currently using a EC2 with 160 GB
RAM (m4.10xlarge). And I am writing the top N
results for each user to a table for a dashboard to pick up instead of offering as a web service.
I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.
My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N
recommendations directly?
python matrix bigdata matrix-multiplication recommender-systems
I am working on a recommender system side project and came across this C++ package which implemented collaborative filtering for BPR with parallel SGD (https://github.com/cjlin1/libmf). The model outputs the P (#users * k)
and Q (#items * k)
matrix as .txt
files. And I am trying to multiply them to get the estimated R
and then rank each row to get the top N
recommendations for each customer.
The package I am using doesn't offer a recommend_top_n
API so I have been researching into an efficient large dense matrix multiplication solutions including Spark IndexedRowMatrix and Scipy with .h5
file.
Dimension:
P (300,000 * 32) * Q (32 * 250,000) = R (300,000 * 250,000)
and I rounded the values to the smallest possible type (int16
). So the resulted R matrix should be around 150 GB
. I am currently using a EC2 with 160 GB
RAM (m4.10xlarge). And I am writing the top N
results for each user to a table for a dashboard to pick up instead of offering as a web service.
I was wondering what would be a good solution for this case? I think the most straight forward way is to partition the user or item matrix like the Strassen Algorithm. But is there a better way to do it more elegantly and efficiently or should I simply upgrade my EC2? I couldn't find a lot of materials on how people are addressing this problem and it seems odd to me that there is not a lot of implementations for this. And it makes me wonder if there is a better solution.
My main code is in Python but I am happy to use any language or tool for this matrix multiplication and ranking piece. Or is it that I shouldn't even be doing the multiplication and should try to use a different collaborative filtering package that offers to predict top N
recommendations directly?
python matrix bigdata matrix-multiplication recommender-systems
python matrix bigdata matrix-multiplication recommender-systems
edited Mar 21 at 18:08
Tina Bu
asked Mar 21 at 15:45
Tina BuTina Bu
65
65
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47
add a comment |
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55284269%2fefficient-matrix-multiplication-and-ranking-for-collaborative-filtering%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Exploit the algebra and decide on your tradeoffs. You only need one matrix-vector mult per user (result a vector) and could even ignore already-worse-than-top-k results. No memory problem here. This could be calculated by a modern phone. Renting costly high-mem machines looks like total overkill
– sascha
Mar 21 at 18:47