Rank subset into quantiles using NtileRescaling a variable in RWhy is `[` better than `subset`?How to rank within groups in R?Subset data and plotting in RHow do I preserve continuous (1,2,3,…n) ranking notation when ranking in R?Subset raster by highest XY percentage of valueRank function inconsistency with the expected output in RHow to calculate rolling quantile for each day on intra day data with data.tableR: Assign Rank 1 to Predifined Largest ValueWriting a function in R to iteratively subset dataframe by timeLinear Regression of Subset depending on spesific date

Brothers & sisters

Blender 2.8 I can't see vertices, edges or faces in edit mode

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

SSH "lag" in LAN on some machines, mixed distros

What does it mean to describe someone as a butt steak?

How to model explosives?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

How can I make my BBEG immortal short of making them a Lich or Vampire?

Could gravitational lensing be used to protect a spaceship from a laser?

1960's book about a plague that kills all white people

Doing something right before you need it - expression for this?

What to put in ESTA if staying in US for a few days before going on to Canada

Why is Collection not simply treated as Collection<?>

Did converts (ger tzedek) in ancient Israel own land?

Did Shadowfax go to Valinor?

What exploit are these user agents trying to use?

Why doesn't H₄O²⁺ exist?

Can a virus destroy the BIOS of a modern computer?

Where does SFDX store details about scratch orgs?

90's TV series where a boy goes to another dimension through portal near power lines

Why is the 'in' operator throwing an error with a string literal instead of logging false?

Is it canonical bit space?

Rank subset into quantiles using Ntile

Rescaling a variable in RWhy is `[` better than `subset`?How to rank within groups in R?Subset data and plotting in RHow do I preserve continuous (1,2,3,…n) ranking notation when ranking in R?Subset raster by highest XY percentage of valueRank function inconsistency with the expected output in RHow to calculate rolling quantile for each day on intra day data with data.tableR: Assign Rank 1 to Predifined Largest ValueWriting a function in R to iteratively subset dataframe by timeLinear Regression of Subset depending on spesific date

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a dataset containing 42840 observations with a total of 119 unique months (Dataset$date). The idea is that i want to assign a quantile to every dataset$Value within each month, and 'rank' them from 1(lowest value) to 5(highest value).

 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
 2009-03 1 35 (1-5)
 2009-04 1 20 ...
 2009-05 1 65 ...
 2009-03 2 24 ...
 2009-04 2 77 ...
 2009-03 3 110 ...
.
.
.
 2018-12 3 125 ...
 2009-03 56 24 ...
 2009-04 56 65 ...
 2009-03 57 26 ...
 2009-04 57 67 ...
 2009-03 58 99 ...

I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.

Any suggestions?

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33

add a comment |

 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
 2009-03 1 35 (1-5)
 2009-04 1 20 ...
 2009-05 1 65 ...
 2009-03 2 24 ...
 2009-04 2 77 ...
 2009-03 3 110 ...
.
.
.
 2018-12 3 125 ...
 2009-03 56 24 ...
 2009-04 56 65 ...
 2009-03 57 26 ...
 2009-04 57 67 ...
 2009-03 58 99 ...

I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.

Any suggestions?

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33

add a comment |

 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
 2009-03 1 35 (1-5)
 2009-04 1 20 ...
 2009-05 1 65 ...
 2009-03 2 24 ...
 2009-04 2 77 ...
 2009-03 3 110 ...
.
.
.
 2018-12 3 125 ...
 2009-03 56 24 ...
 2009-04 56 65 ...
 2009-03 57 26 ...
 2009-04 57 67 ...
 2009-03 58 99 ...

I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.

Any suggestions?

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

 Date Name(ID) Value Quantile (I want to add this column where i assign the values a quantile from 1 to 5)
 2009-03 1 35 (1-5)
 2009-04 1 20 ...
 2009-05 1 65 ...
 2009-03 2 24 ...
 2009-04 2 77 ...
 2009-03 3 110 ...
.
.
.
 2018-12 3 125 ...
 2009-03 56 24 ...
 2009-04 56 65 ...
 2009-03 57 26 ...
 2009-04 57 67 ...
 2009-03 58 99 ...

I've tried to use the Ntile function, which works great for the whole dataset but there doesn't seem to be a function where I can specify for a subset of date.

Any suggestions?

r subset quantile

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

asked Mar 21 at 21:57

Sondre Fiskerstrand

122

What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33

add a comment |

What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33

What package is the Ntile function from? Why can't you just subset your data using square bracket notation and then pass that new, subset data frame into your function?

– divibisan
Mar 21 at 22:33

add a comment |

1 Answer
1

active

oldest

votes

You could use the base rank function with dplyr's group_by:

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
 date = rep(1:12,N),
 value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) 
 nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
 return(ceiling(nx))


# What you want
dat %>% 
 group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
 mutate(rank_detail = rank(value), # ranks the values within each group
 rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
 arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1

answered Mar 21 at 23:17

Ismail Müller

1164

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289872%2frank-subset-into-quantiles-using-ntile%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could use the base rank function with dplyr's group_by:

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
 date = rep(1:12,N),
 value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) 
 nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
 return(ceiling(nx))


# What you want
dat %>% 
 group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
 mutate(rank_detail = rank(value), # ranks the values within each group
 rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
 arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1

answered Mar 21 at 23:17

Ismail Müller

1164

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

add a comment |

You could use the base rank function with dplyr's group_by:

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
 date = rep(1:12,N),
 value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) 
 nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
 return(ceiling(nx))


# What you want
dat %>% 
 group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
 mutate(rank_detail = rank(value), # ranks the values within each group
 rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
 arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1

answered Mar 21 at 23:17

Ismail Müller

1164

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

add a comment |

You could use the base rank function with dplyr's group_by:

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
 date = rep(1:12,N),
 value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) 
 nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
 return(ceiling(nx))


# What you want
dat %>% 
 group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
 mutate(rank_detail = rank(value), # ranks the values within each group
 rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
 arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1

answered Mar 21 at 23:17

Ismail Müller

1164

You could use the base rank function with dplyr's group_by:

library(dplyr)

# Create some data
N <- 3
dat <- tibble(
 date = rep(1:12,N),
 value = runif(12*N, 0, 100)
)

# The rescale function we will use later to fit on your 1-5 scale
## Adapted From https://stackoverflow.com/questions/25962508/rescaling-a-variable-in-r
RESCALE <- function (x, nx1, nx2, minx, maxx) 
 nx = nx1 + (nx2 - nx1) * (x - minx)/(maxx - minx)
 return(ceiling(nx))


# What you want
dat %>% 
 group_by(date) %>% # Group the data by Date so that mutate fill compute the rank's for each Month
 mutate(rank_detail = rank(value), # ranks the values within each group
 rank_group = RESCALE(rank_detail, 1, 5, min(rank_detail), max(rank_detail)) ) %>% # rescales the ranking to be on you 1 to 5 scale
 arrange(date)

# A tibble: 36 x 4
# # Groups: date [12]
# date value rank_detail rank_group
# <int> <dbl> <dbl> <dbl>
# 1 1 92.7 3 5
# 2 1 53.6 2 3
# 3 1 47.8 1 1
# 4 2 24.6 2 3
# 5 2 72.2 3 5
# 6 2 11.5 1 1

answered Mar 21 at 23:17

Ismail Müller

1164

answered Mar 21 at 23:17

Ismail Müller

1164

answered Mar 21 at 23:17

Ismail Müller

1164

answered Mar 21 at 23:17

Ismail Müller

1164

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

add a comment |

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

It seems that the value 1 in rank_group is only assigned if rank_detail is 1 as well. This results in 119 values in rank 1, which is the number of months (03/2009-12/2018 + Global). Based on the total number of values, each rank should have 8568 values. Do you have any suggestions for a solution? Otherwise, the code works great!

– Sondre Fiskerstrand
Mar 22 at 10:03

Hi there. This is probably due to return(ceiling(nx)) in the RESCALE function. Try return(round(nx)) instead.

– Ismail Müller
Mar 24 at 19:11

it now seems that the function isn't dividing the subset equally. for each month the quantiles are as follows: Quantile 1: 45 (Observations) Quantile 2: 90 (Observations) Quantile 3: 90 (Observations) Quantile 4: 90 (Observations) Quantile 5: 45 (Observations) You know why this is?

– Sondre Fiskerstrand
Mar 25 at 10:23

Do you need the ranking of Value being done for each Month ? In this case, if you don't have the same number of observations for each month, then you won't have equal subsetting when you look at your entire dataset !

– Ismail Müller
Mar 26 at 18:43

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1