Is there an R function for returning sorted indexes of any values of a vector?Rcpp rank function that does average tiesHow do you sort a dictionary by value?How do I sort a list of dictionaries by a value of the dictionary?Sort a Map<Key, Value> by valuesHow do I sort a dictionary by value?Set a default parameter value for a JavaScript functionSorting JavaScript Object by property valueSort array of objects by string property valueHow to Sort Multi-dimensional Array by Value?How to return a string value from a Bash functionextending a function that takes a data.table as an argument to use the full table (instead of a subset)

If I said I had $100 when asked, but I actually had $200, would I be lying by omission?

Can a character use multiple reactions in response to the same trigger?

Alternatives to Network Backup

Drawing probabilities on a simplex in TikZ

Why does a sticker slowly peel off, but if it is pulled quickly it tears?

Can a paladin prepare more spells if they didn't cast any the previous day?

Is it true that different variants of the same model aircraft don't require pilot retraining?

Why does the weaker C–H bond have a higher wavenumber than the C=O bond?

Will removing shelving screws from studs damage the studs?

74S vs 74LS ICs

Find feasible point in polynomial time in linear programming

助けてくれて有難う meaning and usage

How to report a deceptive in app purchase

Dotted background on a flowchart

Was a star-crossed lover

Why can't you say don't instead of won't?

Force SQL Server to use fragmented indexes?

Is there a word or phrase that means "use other people's wifi or Internet service without consent"?

Defending Castle from Zombies

Commercial company wants me to list all prior "inventions", give up everything not listed

How many petaflops does it take to land on the moon? What does Artemis need with an Aitken?

Stolen MacBook should I worry about my data?

Is this password scheme legit?

Did ancient peoples ever hide their treasure behind puzzles?



Is there an R function for returning sorted indexes of any values of a vector?


Rcpp rank function that does average tiesHow do you sort a dictionary by value?How do I sort a list of dictionaries by a value of the dictionary?Sort a Map<Key, Value> by valuesHow do I sort a dictionary by value?Set a default parameter value for a JavaScript functionSorting JavaScript Object by property valueSort array of objects by string property valueHow to Sort Multi-dimensional Array by Value?How to return a string value from a Bash functionextending a function that takes a data.table as an argument to use the full table (instead of a subset)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I'm not fluent in R data.table and any help will be greatly appreciated to resolve the following problem !
I have big data.table(~1000000 rows) with columns of numeric values and i want to output a same dimension data.table with the sorted indexes position of each row values.



a short example:



-Input:



dt = data.frame(ack = 1:7)

dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)


first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:



dt$PosA1 = c(1, ...
dt$PosA2 = c(2, ...
dt$PosA3 = c(4, ...
dt$PosA4 = c(3, ...


3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:



dt$PosA1 = c( 1,1,2,...)
dt$PosA2 = c( 2,2,3,...)
dt$PosA3 = c( 4,3,4,...)
dt$PosA4 = c( 3,4,1,...)


Output is a same dimension of input data.table filled with values of sorted indexes by rows .



dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)


I think about perhaps something like this?



library(data.table)
setDT(dt)

# pseudocode
dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
PosA2 := rowPosition(.SD, 2, na.rm=T),
PosA3 := rowPosition(.SD, 3, na.rm=T),
PosA4 := rowPosition(.SD, 4, na.rm=T),
.SDcols=c(A1, A2, A3, A4)]


I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)



A little help would be great to code an efficient one , or another approach to solve the problem!



regards.










share|improve this question
































    1















    I'm not fluent in R data.table and any help will be greatly appreciated to resolve the following problem !
    I have big data.table(~1000000 rows) with columns of numeric values and i want to output a same dimension data.table with the sorted indexes position of each row values.



    a short example:



    -Input:



    dt = data.frame(ack = 1:7)

    dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
    dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
    dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
    dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)


    first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
    row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:



    dt$PosA1 = c(1, ...
    dt$PosA2 = c(2, ...
    dt$PosA3 = c(4, ...
    dt$PosA4 = c(3, ...


    3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:



    dt$PosA1 = c( 1,1,2,...)
    dt$PosA2 = c( 2,2,3,...)
    dt$PosA3 = c( 4,3,4,...)
    dt$PosA4 = c( 3,4,1,...)


    Output is a same dimension of input data.table filled with values of sorted indexes by rows .



    dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
    dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
    dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
    dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)


    I think about perhaps something like this?



    library(data.table)
    setDT(dt)

    # pseudocode
    dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
    PosA2 := rowPosition(.SD, 2, na.rm=T),
    PosA3 := rowPosition(.SD, 3, na.rm=T),
    PosA4 := rowPosition(.SD, 4, na.rm=T),
    .SDcols=c(A1, A2, A3, A4)]


    I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)



    A little help would be great to code an efficient one , or another approach to solve the problem!



    regards.










    share|improve this question




























      1












      1








      1








      I'm not fluent in R data.table and any help will be greatly appreciated to resolve the following problem !
      I have big data.table(~1000000 rows) with columns of numeric values and i want to output a same dimension data.table with the sorted indexes position of each row values.



      a short example:



      -Input:



      dt = data.frame(ack = 1:7)

      dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
      dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
      dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
      dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)


      first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
      row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:



      dt$PosA1 = c(1, ...
      dt$PosA2 = c(2, ...
      dt$PosA3 = c(4, ...
      dt$PosA4 = c(3, ...


      3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:



      dt$PosA1 = c( 1,1,2,...)
      dt$PosA2 = c( 2,2,3,...)
      dt$PosA3 = c( 4,3,4,...)
      dt$PosA4 = c( 3,4,1,...)


      Output is a same dimension of input data.table filled with values of sorted indexes by rows .



      dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
      dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
      dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
      dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)


      I think about perhaps something like this?



      library(data.table)
      setDT(dt)

      # pseudocode
      dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
      PosA2 := rowPosition(.SD, 2, na.rm=T),
      PosA3 := rowPosition(.SD, 3, na.rm=T),
      PosA4 := rowPosition(.SD, 4, na.rm=T),
      .SDcols=c(A1, A2, A3, A4)]


      I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)



      A little help would be great to code an efficient one , or another approach to solve the problem!



      regards.










      share|improve this question
















      I'm not fluent in R data.table and any help will be greatly appreciated to resolve the following problem !
      I have big data.table(~1000000 rows) with columns of numeric values and i want to output a same dimension data.table with the sorted indexes position of each row values.



      a short example:



      -Input:



      dt = data.frame(ack = 1:7)

      dt$A1 = c( 1, 6, 9, 10, 3, 5, NA)
      dt$A2 = c( 25, 12, 30, 10, 50, 1, 30)
      dt$A3 = c( 100, 63, 91, 110, 1, 4, 10)
      dt$A4 = c( 51, 65, 2, 1, 0, 200, 1)


      first row: 1 (1) <= 25 (2) <= 51 (3) <= 100 (4),
      row sorted indexes position for (1, 25, 100, 51) are (1, 2, 4, 3) and output should be:



      dt$PosA1 = c(1, ...
      dt$PosA2 = c(2, ...
      dt$PosA3 = c(4, ...
      dt$PosA4 = c(3, ...


      3rd row : 2 (1) <= 9 (2) <= 30 (3) <= 91 (4) , must output:



      dt$PosA1 = c( 1,1,2,...)
      dt$PosA2 = c( 2,2,3,...)
      dt$PosA3 = c( 4,3,4,...)
      dt$PosA4 = c( 3,4,1,...)


      Output is a same dimension of input data.table filled with values of sorted indexes by rows .



      dt$PosA1 = c( 1, 1, 2, 2, 3, 1, NA)
      dt$PosA2 = c( 2, 2, 3, 3, 4, 2, 3)
      dt$PosA3 = c( 4, 3, 4, 4, 2, 2, 2)
      dt$PosA4 = c( 3, 4, 1, 1, 1, 4, 1)


      I think about perhaps something like this?



      library(data.table)
      setDT(dt)

      # pseudocode
      dt[, PosA1 := rowPosition(.SD, 1, na.rm=T),
      PosA2 := rowPosition(.SD, 2, na.rm=T),
      PosA3 := rowPosition(.SD, 3, na.rm=T),
      PosA4 := rowPosition(.SD, 4, na.rm=T),
      .SDcols=c(A1, A2, A3, A4)]


      I'm not sure of syntax and i miss a rowPosition Function. does any function exist to do that ? (i named it rowPosition here)



      A little help would be great to code an efficient one , or another approach to solve the problem!



      regards.







      r function sorting data.table row






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 27 at 21:30









      Frank

      59.6k6 gold badges67 silver badges143 bronze badges




      59.6k6 gold badges67 silver badges143 bronze badges










      asked Mar 27 at 20:38









      PascalPascal

      82 bronze badges




      82 bronze badges

























          2 Answers
          2






          active

          oldest

          votes


















          1















          Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.



          nr <- 811e3
          nc <- 16
          DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[,
          ack := .I]

          #assuming that you have saved nrussell code in avg_rank.cpp
          library(Rcpp)
          system.time(sourceCpp("rcpp/avg_rank.cpp"))
          # user system elapsed
          # 0.00 0.13 6.21

          nruss_rcpp <- function()
          DT[, as.list(avg_rank(unlist(.SD))), by=ack]


          data.table.frank <- function()
          melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]



          library(microbenchmark)
          microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval cld
          nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a
          data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b



          edit: addressing comments



          1) set column names for rank columns using updating by reference



          DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]


          2) keeping NAs as it is



          option A) change to NA in R after getting output from avg_rank:



          for (j in 1:nc) 
          DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]



          option B) amend the avg_rank code in Rcpp as follows:



          Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

          R_xlen_t sz = x.size();
          Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
          std::sort(w.begin(), w.end(), Comparator(x));

          Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
          for (R_xlen_t n, i = 0; i < sz; i += n)
          n = 1;
          while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
          for (R_xlen_t k = 0; k < n; k++)
          if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
          r[w[i + k]] = NA_REAL; #additional code
          else
          r[w[i + k]] = i + (n + 1) / 2.;




          return r;






          share|improve this answer



























          • hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

            – Pascal
            Mar 29 at 16:10











          • sorry for my low-level knowledge in R :(

            – Pascal
            Mar 29 at 16:12











          • I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

            – Pascal
            Mar 29 at 17:45











          • You got so far in a few hours. These last 2 questions are nothing to you.

            – chinsoon12
            Mar 30 at 0:13











          • :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

            – Pascal
            Mar 30 at 9:33



















          2















          You can convert to long form and use rank. Or, since you're using data.table, frank:



          library(data.table)
          setDT(dt)
          melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]

          ack A1 A2 A3 A4
          1: 1 1 2 4 3
          2: 2 1 2 3 4
          3: 3 2 3 4 1
          4: 4 2 2 3 1
          5: 5 3 4 2 1
          6: 6 3 1 2 4
          7: 7 NA 3 2 1


          melt switches to long form; while dcast converts back to wide form.






          share|improve this answer

























          • Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

            – Pascal
            Mar 27 at 22:19












          • @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

            – Frank
            Mar 27 at 22:39











          • it works fine and do the Job!

            – Pascal
            Mar 27 at 23:11











          • But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

            – Pascal
            Mar 27 at 23:18






          • 1





            Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

            – Pascal
            Mar 28 at 0:04













          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386049%2fis-there-an-r-function-for-returning-sorted-indexes-of-any-values-of-a-vector%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1















          Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.



          nr <- 811e3
          nc <- 16
          DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[,
          ack := .I]

          #assuming that you have saved nrussell code in avg_rank.cpp
          library(Rcpp)
          system.time(sourceCpp("rcpp/avg_rank.cpp"))
          # user system elapsed
          # 0.00 0.13 6.21

          nruss_rcpp <- function()
          DT[, as.list(avg_rank(unlist(.SD))), by=ack]


          data.table.frank <- function()
          melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]



          library(microbenchmark)
          microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval cld
          nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a
          data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b



          edit: addressing comments



          1) set column names for rank columns using updating by reference



          DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]


          2) keeping NAs as it is



          option A) change to NA in R after getting output from avg_rank:



          for (j in 1:nc) 
          DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]



          option B) amend the avg_rank code in Rcpp as follows:



          Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

          R_xlen_t sz = x.size();
          Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
          std::sort(w.begin(), w.end(), Comparator(x));

          Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
          for (R_xlen_t n, i = 0; i < sz; i += n)
          n = 1;
          while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
          for (R_xlen_t k = 0; k < n; k++)
          if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
          r[w[i + k]] = NA_REAL; #additional code
          else
          r[w[i + k]] = i + (n + 1) / 2.;




          return r;






          share|improve this answer



























          • hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

            – Pascal
            Mar 29 at 16:10











          • sorry for my low-level knowledge in R :(

            – Pascal
            Mar 29 at 16:12











          • I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

            – Pascal
            Mar 29 at 17:45











          • You got so far in a few hours. These last 2 questions are nothing to you.

            – chinsoon12
            Mar 30 at 0:13











          • :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

            – Pascal
            Mar 30 at 9:33
















          1















          Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.



          nr <- 811e3
          nc <- 16
          DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[,
          ack := .I]

          #assuming that you have saved nrussell code in avg_rank.cpp
          library(Rcpp)
          system.time(sourceCpp("rcpp/avg_rank.cpp"))
          # user system elapsed
          # 0.00 0.13 6.21

          nruss_rcpp <- function()
          DT[, as.list(avg_rank(unlist(.SD))), by=ack]


          data.table.frank <- function()
          melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]



          library(microbenchmark)
          microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval cld
          nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a
          data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b



          edit: addressing comments



          1) set column names for rank columns using updating by reference



          DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]


          2) keeping NAs as it is



          option A) change to NA in R after getting output from avg_rank:



          for (j in 1:nc) 
          DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]



          option B) amend the avg_rank code in Rcpp as follows:



          Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

          R_xlen_t sz = x.size();
          Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
          std::sort(w.begin(), w.end(), Comparator(x));

          Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
          for (R_xlen_t n, i = 0; i < sz; i += n)
          n = 1;
          while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
          for (R_xlen_t k = 0; k < n; k++)
          if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
          r[w[i + k]] = NA_REAL; #additional code
          else
          r[w[i + k]] = i + (n + 1) / 2.;




          return r;






          share|improve this answer



























          • hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

            – Pascal
            Mar 29 at 16:10











          • sorry for my low-level knowledge in R :(

            – Pascal
            Mar 29 at 16:12











          • I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

            – Pascal
            Mar 29 at 17:45











          • You got so far in a few hours. These last 2 questions are nothing to you.

            – chinsoon12
            Mar 30 at 0:13











          • :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

            – Pascal
            Mar 30 at 9:33














          1














          1










          1









          Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.



          nr <- 811e3
          nc <- 16
          DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[,
          ack := .I]

          #assuming that you have saved nrussell code in avg_rank.cpp
          library(Rcpp)
          system.time(sourceCpp("rcpp/avg_rank.cpp"))
          # user system elapsed
          # 0.00 0.13 6.21

          nruss_rcpp <- function()
          DT[, as.list(avg_rank(unlist(.SD))), by=ack]


          data.table.frank <- function()
          melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]



          library(microbenchmark)
          microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval cld
          nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a
          data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b



          edit: addressing comments



          1) set column names for rank columns using updating by reference



          DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]


          2) keeping NAs as it is



          option A) change to NA in R after getting output from avg_rank:



          for (j in 1:nc) 
          DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]



          option B) amend the avg_rank code in Rcpp as follows:



          Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

          R_xlen_t sz = x.size();
          Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
          std::sort(w.begin(), w.end(), Comparator(x));

          Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
          for (R_xlen_t n, i = 0; i < sz; i += n)
          n = 1;
          while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
          for (R_xlen_t k = 0; k < n; k++)
          if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
          r[w[i + k]] = NA_REAL; #additional code
          else
          r[w[i + k]] = i + (n + 1) / 2.;




          return r;






          share|improve this answer















          Since you are looking for speed, you might want to consider using Rcpp. A Rcpp rank that takes care of NA and ties can be found in nrussell's adapted version of René Richter's code.



          nr <- 811e3
          nc <- 16
          DT <- as.data.table(matrix(sample(c(1:200, NA), nr*nc, replace=TRUE), nrow=nr))[,
          ack := .I]

          #assuming that you have saved nrussell code in avg_rank.cpp
          library(Rcpp)
          system.time(sourceCpp("rcpp/avg_rank.cpp"))
          # user system elapsed
          # 0.00 0.13 6.21

          nruss_rcpp <- function()
          DT[, as.list(avg_rank(unlist(.SD))), by=ack]


          data.table.frank <- function()
          melt(DT, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]



          library(microbenchmark)
          microbenchmark(nruss_rcpp(), data.table.frank(), times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval cld
          nruss_rcpp() 10.33032 10.33251 10.3697 10.3347 10.38939 10.44408 3 a
          data.table.frank() 610.44869 612.82685 613.9362 615.2050 615.68001 616.15501 3 b



          edit: addressing comments



          1) set column names for rank columns using updating by reference



          DT[, (paste0("Rank", 1L:nc)) := as.list(avg_rank(unlist(.SD))), by=ack]


          2) keeping NAs as it is



          option A) change to NA in R after getting output from avg_rank:



          for (j in 1:nc) 
          DT[is.na(get(paste0("V", j))), (paste0("Rank", j)) := NA_real_]



          option B) amend the avg_rank code in Rcpp as follows:



          Rcpp::NumericVector avg_rank(Rcpp::NumericVector x)

          R_xlen_t sz = x.size();
          Rcpp::IntegerVector w = Rcpp::seq(0, sz - 1);
          std::sort(w.begin(), w.end(), Comparator(x));

          Rcpp::NumericVector r = Rcpp::no_init_vector(sz);
          for (R_xlen_t n, i = 0; i < sz; i += n)
          n = 1;
          while (i + n < sz && x[w[i]] == x[w[i + n]]) ++n;
          for (R_xlen_t k = 0; k < n; k++)
          if (Rcpp::traits::is_na<REALSXP>(x[w[i + k]])) #additional code
          r[w[i + k]] = NA_REAL; #additional code
          else
          r[w[i + k]] = i + (n + 1) / 2.;




          return r;







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 3 at 0:54


























          community wiki





          4 revs
          chinsoon12
















          • hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

            – Pascal
            Mar 29 at 16:10











          • sorry for my low-level knowledge in R :(

            – Pascal
            Mar 29 at 16:12











          • I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

            – Pascal
            Mar 29 at 17:45











          • You got so far in a few hours. These last 2 questions are nothing to you.

            – chinsoon12
            Mar 30 at 0:13











          • :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

            – Pascal
            Mar 30 at 9:33


















          • hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

            – Pascal
            Mar 29 at 16:10











          • sorry for my low-level knowledge in R :(

            – Pascal
            Mar 29 at 16:12











          • I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

            – Pascal
            Mar 29 at 17:45











          • You got so far in a few hours. These last 2 questions are nothing to you.

            – chinsoon12
            Mar 30 at 0:13











          • :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

            – Pascal
            Mar 30 at 9:33

















          hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

          – Pascal
          Mar 29 at 16:10





          hello @chinsoon12, should be great but i don't now how avg_rank can be available from my Rstudio envt (library(Rcpp) is not sufficient , and i don't know how to ```` #assuming that you have saved nrussell code in avg_rank.cpp.

          – Pascal
          Mar 29 at 16:10













          sorry for my low-level knowledge in R :(

          – Pascal
          Mar 29 at 16:12





          sorry for my low-level knowledge in R :(

          – Pascal
          Mar 29 at 16:12













          I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

          – Pascal
          Mar 29 at 17:45





          I red TFM, got Rtool installed and source avg_rank.cpp and launch again and.... Greaaaaaat !!! 20s instead 8mn !!!! If I can abuse. I would like NA value stay NA and keep Columns name instead V1...VN. Thx a lot !!!!!!

          – Pascal
          Mar 29 at 17:45













          You got so far in a few hours. These last 2 questions are nothing to you.

          – chinsoon12
          Mar 30 at 0:13





          You got so far in a few hours. These last 2 questions are nothing to you.

          – chinsoon12
          Mar 30 at 0:13













          :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

          – Pascal
          Mar 30 at 9:33






          :)) thanks too encourage me to read harder. I resolve "columns" question (dt[, (cols) = ....]), but inspecting and modifying nrussel code is too hard for me at the moment. So i can get around in looking for a way to compare values of result table and orig and print result values if not NA else NA. But the smart way, in one call, would be to give avg_rank( ) a parameter like na.last = "keep" to take this exception in count).

          – Pascal
          Mar 30 at 9:33














          2















          You can convert to long form and use rank. Or, since you're using data.table, frank:



          library(data.table)
          setDT(dt)
          melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]

          ack A1 A2 A3 A4
          1: 1 1 2 4 3
          2: 2 1 2 3 4
          3: 3 2 3 4 1
          4: 4 2 2 3 1
          5: 5 3 4 2 1
          6: 6 3 1 2 4
          7: 7 NA 3 2 1


          melt switches to long form; while dcast converts back to wide form.






          share|improve this answer

























          • Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

            – Pascal
            Mar 27 at 22:19












          • @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

            – Frank
            Mar 27 at 22:39











          • it works fine and do the Job!

            – Pascal
            Mar 27 at 23:11











          • But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

            – Pascal
            Mar 27 at 23:18






          • 1





            Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

            – Pascal
            Mar 28 at 0:04















          2















          You can convert to long form and use rank. Or, since you're using data.table, frank:



          library(data.table)
          setDT(dt)
          melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]

          ack A1 A2 A3 A4
          1: 1 1 2 4 3
          2: 2 1 2 3 4
          3: 3 2 3 4 1
          4: 4 2 2 3 1
          5: 5 3 4 2 1
          6: 6 3 1 2 4
          7: 7 NA 3 2 1


          melt switches to long form; while dcast converts back to wide form.






          share|improve this answer

























          • Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

            – Pascal
            Mar 27 at 22:19












          • @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

            – Frank
            Mar 27 at 22:39











          • it works fine and do the Job!

            – Pascal
            Mar 27 at 23:11











          • But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

            – Pascal
            Mar 27 at 23:18






          • 1





            Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

            – Pascal
            Mar 28 at 0:04













          2














          2










          2









          You can convert to long form and use rank. Or, since you're using data.table, frank:



          library(data.table)
          setDT(dt)
          melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]

          ack A1 A2 A3 A4
          1: 1 1 2 4 3
          2: 2 1 2 3 4
          3: 3 2 3 4 1
          4: 4 2 2 3 1
          5: 5 3 4 2 1
          6: 6 3 1 2 4
          7: 7 NA 3 2 1


          melt switches to long form; while dcast converts back to wide form.






          share|improve this answer













          You can convert to long form and use rank. Or, since you're using data.table, frank:



          library(data.table)
          setDT(dt)
          melt(dt, id="ack")[, f := frank(value, na.last="keep", ties.method="dense"), by=ack][,
          dcast(.SD, ack ~ variable, value.var="f")]

          ack A1 A2 A3 A4
          1: 1 1 2 4 3
          2: 2 1 2 3 4
          3: 3 2 3 4 1
          4: 4 2 2 3 1
          5: 5 3 4 2 1
          6: 6 3 1 2 4
          7: 7 NA 3 2 1


          melt switches to long form; while dcast converts back to wide form.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 27 at 21:35









          FrankFrank

          59.6k6 gold badges67 silver badges143 bronze badges




          59.6k6 gold badges67 silver badges143 bronze badges















          • Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

            – Pascal
            Mar 27 at 22:19












          • @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

            – Frank
            Mar 27 at 22:39











          • it works fine and do the Job!

            – Pascal
            Mar 27 at 23:11











          • But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

            – Pascal
            Mar 27 at 23:18






          • 1





            Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

            – Pascal
            Mar 28 at 0:04

















          • Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

            – Pascal
            Mar 27 at 22:19












          • @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

            – Frank
            Mar 27 at 22:39











          • it works fine and do the Job!

            – Pascal
            Mar 27 at 23:11











          • But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

            – Pascal
            Mar 27 at 23:18






          • 1





            Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

            – Pascal
            Mar 28 at 0:04
















          Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

          – Pascal
          Mar 27 at 22:19






          Thx @Frank , but i encounter an error: Error in melt.data.table(dt, id = "ack") : One or more values in 'id.vars' is invalid.

          – Pascal
          Mar 27 at 22:19














          @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

          – Frank
          Mar 27 at 22:39





          @Pascal You will need to create a row-ID column, like dt[, ack := .I] or dt$ack <- seq_len(nrow(dt)). I'm using the code from your post after I edited it so that it is copy-pastable. You can look above to see what I mean. Of course, you don't need to name it ack :)

          – Frank
          Mar 27 at 22:39













          it works fine and do the Job!

          – Pascal
          Mar 27 at 23:11





          it works fine and do the Job!

          – Pascal
          Mar 27 at 23:11













          But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

          – Pascal
          Mar 27 at 23:18





          But if I Sys.time() on my data.table (811000 x 16 ) and take about 8mn on a 4 Core I5 vPro 8th Gen , 16Go RAM. Is there a way to optimize this duration or i should consider it's a good count ?

          – Pascal
          Mar 27 at 23:18




          1




          1





          Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

          – Pascal
          Mar 28 at 0:04





          Thanks a lot for this solution ! i wil take lot of coffee cup i waiting for better :)!

          – Pascal
          Mar 28 at 0:04

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386049%2fis-there-an-r-function-for-returning-sorted-indexes-of-any-values-of-a-vector%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript