Rank subset into quantiles using NtileRescaling a variable in RWhy is `[` better than `subset`?How to rank within groups in R?Subset data and plotting in RHow do I preserve continuous (1,2,3,…n) ranking notation when ranking in R?Subset raster by highest XY percentage of valueRank function inconsistency with the expected output in RHow to calculate rolling quantile for each day on intra day data with data.tableR: Assign Rank 1 to Predifined Largest ValueWriting a function in R to iteratively subset dataframe by timeLinear Regression of Subset depending on spesific date

Brothers & sisters

Blender 2.8 I can't see vertices, edges or faces in edit mode

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

SSH "lag" in LAN on some machines, mixed distros

What does it mean to describe someone as a butt steak?

How to model explosives?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

How can I make my BBEG immortal short of making them a Lich or Vampire?

Could gravitational lensing be used to protect a spaceship from a laser?

1960's book about a plague that kills all white people

Doing something right before you need it - expression for this?

What to put in ESTA if staying in US for a few days before going on to Canada

Why is Collection not simply treated as Collection<?>

Did converts (ger tzedek) in ancient Israel own land?

Did Shadowfax go to Valinor?

What exploit are these user agents trying to use?

Why doesn't H₄O²⁺ exist?

Can a virus destroy the BIOS of a modern computer?

Where does SFDX store details about scratch orgs?

90's TV series where a boy goes to another dimension through portal near power lines

Why is the 'in' operator throwing an error with a string literal instead of logging false?

Is it canonical bit space?



Rank subset into quantiles using Ntile


Rescaling a variable in RWhy is `[` better than `subset`?How to rank within groups in R?Subset data and plotting in RHow do I preserve continuous (1,2,3,…n) ranking notation when ranking in R?Subset raster by highest XY percentage of valueRank function inconsistency with the expected output in RHow to calculate rolling quantile for each day on intra day data with data.tableR: Assign Rank 1 to Predifined Largest ValueWriting a function in R to iteratively subset dataframe by timeLinear Regression of Subset depending on spesific date






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a dataset containing 42840 observations with a total of 119 unique months (Dataset$date). The idea is that i want to assign a quantile to every dataset$Value within each month, and 'rank' them from 1(lowest value) to 5(highest value).



 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
2009-03 1 35 (1-5)
2009-04 1 20 ...
2009-05 1 65 ...
2009-03 2 24 ...
2009-04 2 77 ...
2009-03 3 110 ...
.
.
.
2018-12 3 125 ...
2009-03 56 24 ...
2009-04 56 65 ...
2009-03 57 26 ...
2009-04 57 67 ...
2009-03 58 99 ...


I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.



Any suggestions?










share|improve this question






















  • What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

    – divibisan
    Mar 21 at 22:33

















0















I have a dataset containing 42840 observations with a total of 119 unique months (Dataset$date). The idea is that i want to assign a quantile to every dataset$Value within each month, and 'rank' them from 1(lowest value) to 5(highest value).



 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
2009-03 1 35 (1-5)
2009-04 1 20 ...
2009-05 1 65 ...
2009-03 2 24 ...
2009-04 2 77 ...
2009-03 3 110 ...
.
.
.
2018-12 3 125 ...
2009-03 56 24 ...
2009-04 56 65 ...
2009-03 57 26 ...
2009-04 57 67 ...
2009-03 58 99 ...


I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.



Any suggestions?










share|improve this question






















  • What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

    – divibisan
    Mar 21 at 22:33













0












0








0








I have a dataset containing 42840 observations with a total of 119 unique months (Dataset$date). The idea is that i want to assign a quantile to every dataset$Value within each month, and 'rank' them from 1(lowest value) to 5(highest value).



 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
2009-03 1 35 (1-5)
2009-04 1 20 ...
2009-05 1 65 ...
2009-03 2 24 ...
2009-04 2 77 ...
2009-03 3 110 ...
.
.
.
2018-12 3 125 ...
2009-03 56 24 ...
2009-04 56 65 ...
2009-03 57 26 ...
2009-04 57 67 ...
2009-03 58 99 ...


I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.



Any suggestions?










share|improve this question














I have a dataset containing 42840 observations with a total of 119 unique months (Dataset$date). The idea is that i want to assign a quantile to every dataset$Value within each month, and 'rank' them from 1(lowest value) to 5(highest value).



 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
2009-03 1 35 (1-5)
2009-04 1 20 ...
2009-05 1 65 ...
2009-03 2 24 ...
2009-04 2 77 ...
2009-03 3 110 ...
.
.
.
2018-12 3 125 ...
2009-03 56 24 ...
2009-04 56 65 ...
2009-03 57 26 ...
2009-04 57 67 ...
2009-03 58 99 ...


I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.



Any suggestions?







r subset quantile






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 21 at 21:57









Sondre FiskerstrandSondre Fiskerstrand

122




122












  • What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

    – divibisan
    Mar 21 at 22:33

















  • What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

    – divibisan
    Mar 21 at 22:33
















What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33





What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33












1 Answer
1






active

oldest

votes


















0














You could use the base rank function with dplyr's group_by:



library(dplyr)

# Create some data
N <- 3
dat <- tibble(
date = rep(1:12,N),
value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx)
nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
return(ceiling(nx))


# What you want
dat %>%
group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
mutate(rank_detail = rank(value), # ranks the values within each group
rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1





share|improve this answer























  • It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

    – Sondre Fiskerstrand
    Mar 22 at 10:03











  • Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

    – Ismail Müller
    Mar 24 at 19:11











  • it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

    – Sondre Fiskerstrand
    Mar 25 at 10:23












  • Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

    – Ismail Müller
    Mar 26 at 18:43











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289872%2frank-subset-into-quantiles-using-ntile%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














You could use the base rank function with dplyr's group_by:



library(dplyr)

# Create some data
N <- 3
dat <- tibble(
date = rep(1:12,N),
value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx)
nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
return(ceiling(nx))


# What you want
dat %>%
group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
mutate(rank_detail = rank(value), # ranks the values within each group
rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1





share|improve this answer























  • It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

    – Sondre Fiskerstrand
    Mar 22 at 10:03











  • Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

    – Ismail Müller
    Mar 24 at 19:11











  • it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

    – Sondre Fiskerstrand
    Mar 25 at 10:23












  • Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

    – Ismail Müller
    Mar 26 at 18:43















0














You could use the base rank function with dplyr's group_by:



library(dplyr)

# Create some data
N <- 3
dat <- tibble(
date = rep(1:12,N),
value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx)
nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
return(ceiling(nx))


# What you want
dat %>%
group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
mutate(rank_detail = rank(value), # ranks the values within each group
rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1





share|improve this answer























  • It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

    – Sondre Fiskerstrand
    Mar 22 at 10:03











  • Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

    – Ismail Müller
    Mar 24 at 19:11











  • it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

    – Sondre Fiskerstrand
    Mar 25 at 10:23












  • Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

    – Ismail Müller
    Mar 26 at 18:43













0












0








0







You could use the base rank function with dplyr's group_by:



library(dplyr)

# Create some data
N <- 3
dat <- tibble(
date = rep(1:12,N),
value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx)
nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
return(ceiling(nx))


# What you want
dat %>%
group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
mutate(rank_detail = rank(value), # ranks the values within each group
rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1





share|improve this answer













You could use the base rank function with dplyr's group_by:



library(dplyr)

# Create some data
N <- 3
dat <- tibble(
date = rep(1:12,N),
value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx)
nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
return(ceiling(nx))


# What you want
dat %>%
group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
mutate(rank_detail = rank(value), # ranks the values within each group
rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 21 at 23:17









Ismail MüllerIsmail Müller

1164




1164












  • It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

    – Sondre Fiskerstrand
    Mar 22 at 10:03











  • Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

    – Ismail Müller
    Mar 24 at 19:11











  • it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

    – Sondre Fiskerstrand
    Mar 25 at 10:23












  • Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

    – Ismail Müller
    Mar 26 at 18:43

















  • It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

    – Sondre Fiskerstrand
    Mar 22 at 10:03











  • Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

    – Ismail Müller
    Mar 24 at 19:11











  • it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

    – Sondre Fiskerstrand
    Mar 25 at 10:23












  • Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

    – Ismail Müller
    Mar 26 at 18:43
















It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03





It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03













Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11





Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11













it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23






it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23














Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43





Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289872%2frank-subset-into-quantiles-using-ntile%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript