Is there an R function for returning sorted indexes of any values of a vector?Rcpp rank function that does average tiesHow do you sort a dictionary by value?How do I sort a list of dictionaries by a value of the dictionary?Sort a Map<Key, Value> by valuesHow do I sort a dictionary by value?Set a default parameter value for a JavaScript functionSorting JavaScript Object by property valueSort array of objects by string property valueHow to Sort Multi-dimensional Array by Value?How to return a string value from a Bash functionextending a function that takes a data.table as an argument to use the full table (instead of a subset)

If I said I had $100 when asked, but I actually had $200, would I be lying by omission?

Can a character use multiple reactions in response to the same trigger?

Alternatives to Network Backup

Drawing probabilities on a simplex in TikZ

Why does a sticker slowly peel off, but if it is pulled quickly it tears?

Can a paladin prepare more spells if they didn't cast any the previous day?

Is it true that different variants of the same model aircraft don't require pilot retraining?

Why does the weaker C–H bond have a higher wavenumber than the C=O bond?

Will removing shelving screws from studs damage the studs?

74S vs 74LS ICs

Find feasible point in polynomial time in linear programming

助けてくれて有難う meaning and usage

How to report a deceptive in app purchase

Dotted background on a flowchart

Was a star-crossed lover

Why can't you say don't instead of won't?

Force SQL Server to use fragmented indexes?

Is there a word or phrase that means "use other people's wifi or Internet service without consent"?

Defending Castle from Zombies

Commercial company wants me to list all prior "inventions", give up everything not listed

How many petaflops does it take to land on the moon? What does Artemis need with an Aitken?

Stolen MacBook should I worry about my data?

Is this password scheme legit?

Did ancient peoples ever hide their treasure behind puzzles?

Is there an R function for returning sorted indexes of any values of a vector?

Rcpp rank function that does average tiesHow do you sort a dictionary by value?How do I sort a list of dictionaries by a value of the dictionary?Sort a Map<Key, Value> by valuesHow do I sort a dictionary by value?Set a default parameter value for a JavaScript functionSorting JavaScript Object by property valueSort array of objects by string property valueHow to Sort Multi-dimensional Array by Value?How to return a string value from a Bash functionextending a function that takes a data.table as an argument to use the full table (instead of a subset)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm not fluent in R data.table and any help will be greatly appreciated to resolve the following problem !
I have big data.table(~1000000 rows) with columns of numeric values and i want to output a same dimension data.table with the sorted indexes position of each row values.

a short example:

-Input:

dt = data.frame(ack = 1:7)

dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)

first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:

dt$PosA1 = c(1, ...
dt$PosA2 = c(2, ...
dt$PosA3 = c(4, ...
dt$PosA4 = c(3, ...

3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:

dt$PosA1 = c( 1,1,2,...)
dt$PosA2 = c( 2,2,3,...)
dt$PosA3 = c( 4,3,4,...)
dt$PosA4 = c( 3,4,1,...)

Output is a same dimension of input data.table filled with values of sorted indexes by rows .

dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)

I think about perhaps something like this?

library(data.table)
setDT(dt)

# pseudocode
dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
 PosA2 := rowPosition(.SD, 2, na.rm=T),
 PosA3 := rowPosition(.SD, 3, na.rm=T),
 PosA4 := rowPosition(.SD, 4, na.rm=T),
 .SDcols=c(A1, A2, A3, A4)]

I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)

A little help would be great to code an efficient one , or another approach to solve the problem!

regards.

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

add a comment |

a short example:

-Input:

dt = data.frame(ack = 1:7)

dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)

first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:

dt$PosA1 = c(1, ...
dt$PosA2 = c(2, ...
dt$PosA3 = c(4, ...
dt$PosA4 = c(3, ...

3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:

dt$PosA1 = c( 1,1,2,...)
dt$PosA2 = c( 2,2,3,...)
dt$PosA3 = c( 4,3,4,...)
dt$PosA4 = c( 3,4,1,...)

Output is a same dimension of input data.table filled with values of sorted indexes by rows .

dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)

I think about perhaps something like this?

library(data.table)
setDT(dt)

# pseudocode
dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
 PosA2 := rowPosition(.SD, 2, na.rm=T),
 PosA3 := rowPosition(.SD, 3, na.rm=T),
 PosA4 := rowPosition(.SD, 4, na.rm=T),
 .SDcols=c(A1, A2, A3, A4)]

I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)

A little help would be great to code an efficient one , or another approach to solve the problem!

regards.

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

add a comment |

a short example:

-Input:

dt = data.frame(ack = 1:7)

dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)

first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:

dt$PosA1 = c(1, ...
dt$PosA2 = c(2, ...
dt$PosA3 = c(4, ...
dt$PosA4 = c(3, ...

3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:

dt$PosA1 = c( 1,1,2,...)
dt$PosA2 = c( 2,2,3,...)
dt$PosA3 = c( 4,3,4,...)
dt$PosA4 = c( 3,4,1,...)

Output is a same dimension of input data.table filled with values of sorted indexes by rows .

dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)

I think about perhaps something like this?

library(data.table)
setDT(dt)

# pseudocode
dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
 PosA2 := rowPosition(.SD, 2, na.rm=T),
 PosA3 := rowPosition(.SD, 3, na.rm=T),
 PosA4 := rowPosition(.SD, 4, na.rm=T),
 .SDcols=c(A1, A2, A3, A4)]

I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)

A little help would be great to code an efficient one , or another approach to solve the problem!

regards.

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

a short example:

-Input:

dt = data.frame(ack = 1:7)

dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)

first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:

dt$PosA1 = c(1, ...
dt$PosA2 = c(2, ...
dt$PosA3 = c(4, ...
dt$PosA4 = c(3, ...

3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:

dt$PosA1 = c( 1,1,2,...)
dt$PosA2 = c( 2,2,3,...)
dt$PosA3 = c( 4,3,4,...)
dt$PosA4 = c( 3,4,1,...)

Output is a same dimension of input data.table filled with values of sorted indexes by rows .

dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)

I think about perhaps something like this?

library(data.table)
setDT(dt)

# pseudocode
dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
 PosA2 := rowPosition(.SD, 2, na.rm=T),
 PosA3 := rowPosition(.SD, 3, na.rm=T),
 PosA4 := rowPosition(.SD, 4, na.rm=T),
 .SDcols=c(A1, A2, A3, A4)]

I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)

A little help would be great to code an efficient one , or another approach to solve the problem!

regards.

r function sorting data.table row

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

edited Mar 27 at 21:30

Frank

59.6k6 gold badges67 silver badges143 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

asked Mar 27 at 20:38

Pascal

82 bronze badges

add a comment |

2 Answers
2

active

oldest

votes

Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.

nr <- 811e3
nc <- 16
DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[, 
 ack := .I]

#assuming that you have saved nrussell code in avg_rank.cpp
library(Rcpp)
system.time(sourceCpp("rcpp/avg_rank.cpp"))
# user system elapsed 
# 0.00 0.13 6.21 

nruss_rcpp <- function() 
 DT[, as.list(avg_rank(unlist(.SD))), by=ack]


data.table.frank <- function() 
 melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]



library(microbenchmark)
microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)

timings:

Unit: seconds
 expr min lq mean median uq max neval cld
 nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a 
 data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b

edit: addressing comments

1) set column names for rank columns using updating by reference

DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]

2) keeping NAs as it is

option A) change to NA in R after getting output from avg_rank:

for (j in 1:nc) 
 DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]

option B) amend the avg_rank code in Rcpp as follows:

Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

 R_xlen_t sz = x.size();
 Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
 std::sort(w.begin(), w.end(), Comparator(x));

 Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
 for (R_xlen_t n, i = 0; i < sz; i += n) 
 n = 1;
 while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
 for (R_xlen_t k = 0; k < n; k++) 
 if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
 r[w[i + k]] = NA_REAL; #additional code
 else 
 r[w[i + k]] = i + (n + 1) / 2.;
 
 
 

 return r;

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

|
show 2 more comments

You can convert to long form and use rank. Or, since you're using data.table, frank:

library(data.table)
setDT(dt)
melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]

 ack A1 A2 A3 A4
1: 1 1 2 4 3
2: 2 1 2 3 4
3: 3 2 3 4 1
4: 4 2 2 3 1
5: 5 3 4 2 1
6: 6 3 1 2 4
7: 7 NA 3 2 1

melt switches to long form; while dcast converts back to wide form.

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

1

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386049%2fis-there-an-r-function-for-returning-sorted-indexes-of-any-values-of-a-vector%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.

nr <- 811e3
nc <- 16
DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[, 
 ack := .I]

#assuming that you have saved nrussell code in avg_rank.cpp
library(Rcpp)
system.time(sourceCpp("rcpp/avg_rank.cpp"))
# user system elapsed 
# 0.00 0.13 6.21 

nruss_rcpp <- function() 
 DT[, as.list(avg_rank(unlist(.SD))), by=ack]


data.table.frank <- function() 
 melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]



library(microbenchmark)
microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)

timings:

Unit: seconds
 expr min lq mean median uq max neval cld
 nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a 
 data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b

edit: addressing comments

1) set column names for rank columns using updating by reference

DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]

2) keeping NAs as it is

option A) change to NA in R after getting output from avg_rank:

for (j in 1:nc) 
 DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]

option B) amend the avg_rank code in Rcpp as follows:

Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

 R_xlen_t sz = x.size();
 Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
 std::sort(w.begin(), w.end(), Comparator(x));

 Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
 for (R_xlen_t n, i = 0; i < sz; i += n) 
 n = 1;
 while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
 for (R_xlen_t k = 0; k < n; k++) 
 if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
 r[w[i + k]] = NA_REAL; #additional code
 else 
 r[w[i + k]] = i + (n + 1) / 2.;
 
 
 

 return r;

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

|
show 2 more comments

Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.

nr <- 811e3
nc <- 16
DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[, 
 ack := .I]

#assuming that you have saved nrussell code in avg_rank.cpp
library(Rcpp)
system.time(sourceCpp("rcpp/avg_rank.cpp"))
# user system elapsed 
# 0.00 0.13 6.21 

nruss_rcpp <- function() 
 DT[, as.list(avg_rank(unlist(.SD))), by=ack]


data.table.frank <- function() 
 melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]



library(microbenchmark)
microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)

timings:

Unit: seconds
 expr min lq mean median uq max neval cld
 nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a 
 data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b

edit: addressing comments

1) set column names for rank columns using updating by reference

DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]

2) keeping NAs as it is

option A) change to NA in R after getting output from avg_rank:

for (j in 1:nc) 
 DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]

option B) amend the avg_rank code in Rcpp as follows:

Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

 R_xlen_t sz = x.size();
 Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
 std::sort(w.begin(), w.end(), Comparator(x));

 Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
 for (R_xlen_t n, i = 0; i < sz; i += n) 
 n = 1;
 while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
 for (R_xlen_t k = 0; k < n; k++) 
 if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
 r[w[i + k]] = NA_REAL; #additional code
 else 
 r[w[i + k]] = i + (n + 1) / 2.;
 
 
 

 return r;

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

|
show 2 more comments

Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.

nr <- 811e3
nc <- 16
DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[, 
 ack := .I]

#assuming that you have saved nrussell code in avg_rank.cpp
library(Rcpp)
system.time(sourceCpp("rcpp/avg_rank.cpp"))
# user system elapsed 
# 0.00 0.13 6.21 

nruss_rcpp <- function() 
 DT[, as.list(avg_rank(unlist(.SD))), by=ack]


data.table.frank <- function() 
 melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]



library(microbenchmark)
microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)

timings:

Unit: seconds
 expr min lq mean median uq max neval cld
 nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a 
 data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b

edit: addressing comments

1) set column names for rank columns using updating by reference

DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]

2) keeping NAs as it is

option A) change to NA in R after getting output from avg_rank:

for (j in 1:nc) 
 DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]

option B) amend the avg_rank code in Rcpp as follows:

Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

 R_xlen_t sz = x.size();
 Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
 std::sort(w.begin(), w.end(), Comparator(x));

 Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
 for (R_xlen_t n, i = 0; i < sz; i += n) 
 n = 1;
 while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
 for (R_xlen_t k = 0; k < n; k++) 
 if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
 r[w[i + k]] = NA_REAL; #additional code
 else 
 r[w[i + k]] = i + (n + 1) / 2.;
 
 
 

 return r;

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.

nr <- 811e3
nc <- 16
DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[, 
 ack := .I]

#assuming that you have saved nrussell code in avg_rank.cpp
library(Rcpp)
system.time(sourceCpp("rcpp/avg_rank.cpp"))
# user system elapsed 
# 0.00 0.13 6.21 

nruss_rcpp <- function() 
 DT[, as.list(avg_rank(unlist(.SD))), by=ack]


data.table.frank <- function() 
 melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]



library(microbenchmark)
microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)

timings:

Unit: seconds
 expr min lq mean median uq max neval cld
 nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a 
 data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b

edit: addressing comments

1) set column names for rank columns using updating by reference

DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]

2) keeping NAs as it is

option A) change to NA in R after getting output from avg_rank:

for (j in 1:nc) 
 DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]

option B) amend the avg_rank code in Rcpp as follows:

Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

 R_xlen_t sz = x.size();
 Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
 std::sort(w.begin(), w.end(), Comparator(x));

 Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
 for (R_xlen_t n, i = 0; i < sz; i += n) 
 n = 1;
 while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
 for (R_xlen_t k = 0; k < n; k++) 
 if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
 r[w[i + k]] = NA_REAL; #additional code
 else 
 r[w[i + k]] = i + (n + 1) / 2.;
 
 
 

 return r;

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

edited Apr 3 at 0:54

community wiki

4 revs
chinsoon12

community wiki

4 revs
chinsoon12

community wiki

4 revs
chinsoon12

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

|
show 2 more comments

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

– Pascal
Mar 29 at 16:10

sorry for my low-level knowledge in R :(

– Pascal
Mar 29 at 16:12

I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

– Pascal
Mar 29 at 17:45

You got so far in a few hours. These last 2 questions are nothing to you.

– chinsoon12
Mar 30 at 0:13

:)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

– Pascal
Mar 30 at 9:33

|
show 2 more comments

You can convert to long form and use rank. Or, since you're using data.table, frank:

library(data.table)
setDT(dt)
melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]

 ack A1 A2 A3 A4
1: 1 1 2 4 3
2: 2 1 2 3 4
3: 3 2 3 4 1
4: 4 2 2 3 1
5: 5 3 4 2 1
6: 6 3 1 2 4
7: 7 NA 3 2 1

melt switches to long form; while dcast converts back to wide form.

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

1

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

|
show 1 more comment

You can convert to long form and use rank. Or, since you're using data.table, frank:

library(data.table)
setDT(dt)
melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]

 ack A1 A2 A3 A4
1: 1 1 2 4 3
2: 2 1 2 3 4
3: 3 2 3 4 1
4: 4 2 2 3 1
5: 5 3 4 2 1
6: 6 3 1 2 4
7: 7 NA 3 2 1

melt switches to long form; while dcast converts back to wide form.

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

1

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

|
show 1 more comment

You can convert to long form and use rank. Or, since you're using data.table, frank:

library(data.table)
setDT(dt)
melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]

 ack A1 A2 A3 A4
1: 1 1 2 4 3
2: 2 1 2 3 4
3: 3 2 3 4 1
4: 4 2 2 3 1
5: 5 3 4 2 1
6: 6 3 1 2 4
7: 7 NA 3 2 1

melt switches to long form; while dcast converts back to wide form.

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

You can convert to long form and use rank. Or, since you're using data.table, frank:

library(data.table)
setDT(dt)
melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][, 
 dcast(.SD, ack ~ variable, value.var="f")]

 ack A1 A2 A3 A4
1: 1 1 2 4 3
2: 2 1 2 3 4
3: 3 2 3 4 1
4: 4 2 2 3 1
5: 5 3 4 2 1
6: 6 3 1 2 4
7: 7 NA 3 2 1

melt switches to long form; while dcast converts back to wide form.

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

answered Mar 27 at 21:35

Frank

59.6k6 gold badges67 silver badges143 bronze badges

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

1

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

|
show 1 more comment

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

1

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

– Pascal
Mar 27 at 22:19

@Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

– Frank
Mar 27 at 22:39

it works fine and do the Job!

– Pascal
Mar 27 at 23:11

But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

– Pascal
Mar 27 at 23:18

Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

– Pascal
Mar 28 at 0:04

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2