What does the replace mean in sample function? [closed]Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given
Is it possible to install Firefox on Ubuntu with no desktop enviroment?
Are there any rules for identifying what spell an opponent is casting?
Struggling to present results from long papers in short time slots
Is it a bad idea to have an pen name with only an initial for a surname?
Leveraging cash for buying car
Leveling up and Getting Items!
Does anyone recognize these rockets, and their location?
The title "Mord mit Aussicht" explained
How to avoid offending original culture when making conculture inspired from original
Should I email my professor to clear up a (possibly very irrelevant) awkward misunderstanding?
Converting 3x7 to a 1x7. Is it possible with only existing parts?
Why did the USA sell so many airplanes prior to WW2?
How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?
TiKZ won't graph 1/sqrt(x)
SQL Server has encountered occurences of I/O requests taking longer than 15 seconds
What made the Ancient One do this in Endgame?
Interview was just a one hour panel. Got an offer the next day; do I accept or is this a red flag?
Co-worker is now managing my team. Does this mean that I'm being demoted?
How can I improve readability and length of a method with many if statements?
How do credit card companies know what type of business I'm paying for?
Why not make one big CPU core?
Someone who is granted access to information but not expected to read it
At zero velocity, is this object neither speeding up nor slowing down?
Using roof rails to set up hammock
What does the replace mean in sample function? [closed]
Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.
I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.
I am not sure how to fix this?
mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)
r random
closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
add a comment |
I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.
I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.
I am not sure how to fix this?
mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)
r random
closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
3
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)
– r2evans
Mar 25 at 3:00
add a comment |
I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.
I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.
I am not sure how to fix this?
mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)
r random
I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.
I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.
I am not sure how to fix this?
mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)
r random
r random
asked Mar 25 at 2:55
Victor.HVictor.H
797
797
closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
3
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)
– r2evans
Mar 25 at 3:00
add a comment |
3
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)
– r2evans
Mar 25 at 3:00
3
3
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (
replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)– r2evans
Mar 25 at 3:00
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (
replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)– r2evans
Mar 25 at 3:00
add a comment |
2 Answers
2
active
oldest
votes
Let's first understand the concept of replace
.
By default replace
is FALSE
in sample
. So when you do
sample(1:5, 2)
#[1] 4 1
As replace
is default set to FALSE
you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5
.
But now when you do
sample(1:5, 6)
it results in an error stating
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
which means you are trying to take 6 unique element from 1:5
which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE
you say that elements are allowed to be repeated.
sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1
Here, we see with replace = TRUE
element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.
So now, I hope the concept of replace
is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
This will be TRUE
only when the number of rows in the dataframe is less than 10.
Using reproducible example from mtcars
table(mtcars$cyl)
# 4 6 8
#11 7 14
We see cyl = 6
has 7 rows which is less than 10.
mydata2 <- split(mtcars, mtcars$cyl)
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
In the output we can see the rows are repeated only for cyl = 6
and not for anybody else.
add a comment |
Perhaps only use replace=T
when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.
replace - If this is true a sample may contain an element several times while another element might not occur at all.
http://www.rexamples.com/14/Sample()
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Let's first understand the concept of replace
.
By default replace
is FALSE
in sample
. So when you do
sample(1:5, 2)
#[1] 4 1
As replace
is default set to FALSE
you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5
.
But now when you do
sample(1:5, 6)
it results in an error stating
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
which means you are trying to take 6 unique element from 1:5
which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE
you say that elements are allowed to be repeated.
sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1
Here, we see with replace = TRUE
element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.
So now, I hope the concept of replace
is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
This will be TRUE
only when the number of rows in the dataframe is less than 10.
Using reproducible example from mtcars
table(mtcars$cyl)
# 4 6 8
#11 7 14
We see cyl = 6
has 7 rows which is less than 10.
mydata2 <- split(mtcars, mtcars$cyl)
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
In the output we can see the rows are repeated only for cyl = 6
and not for anybody else.
add a comment |
Let's first understand the concept of replace
.
By default replace
is FALSE
in sample
. So when you do
sample(1:5, 2)
#[1] 4 1
As replace
is default set to FALSE
you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5
.
But now when you do
sample(1:5, 6)
it results in an error stating
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
which means you are trying to take 6 unique element from 1:5
which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE
you say that elements are allowed to be repeated.
sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1
Here, we see with replace = TRUE
element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.
So now, I hope the concept of replace
is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
This will be TRUE
only when the number of rows in the dataframe is less than 10.
Using reproducible example from mtcars
table(mtcars$cyl)
# 4 6 8
#11 7 14
We see cyl = 6
has 7 rows which is less than 10.
mydata2 <- split(mtcars, mtcars$cyl)
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
In the output we can see the rows are repeated only for cyl = 6
and not for anybody else.
add a comment |
Let's first understand the concept of replace
.
By default replace
is FALSE
in sample
. So when you do
sample(1:5, 2)
#[1] 4 1
As replace
is default set to FALSE
you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5
.
But now when you do
sample(1:5, 6)
it results in an error stating
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
which means you are trying to take 6 unique element from 1:5
which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE
you say that elements are allowed to be repeated.
sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1
Here, we see with replace = TRUE
element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.
So now, I hope the concept of replace
is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
This will be TRUE
only when the number of rows in the dataframe is less than 10.
Using reproducible example from mtcars
table(mtcars$cyl)
# 4 6 8
#11 7 14
We see cyl = 6
has 7 rows which is less than 10.
mydata2 <- split(mtcars, mtcars$cyl)
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
In the output we can see the rows are repeated only for cyl = 6
and not for anybody else.
Let's first understand the concept of replace
.
By default replace
is FALSE
in sample
. So when you do
sample(1:5, 2)
#[1] 4 1
As replace
is default set to FALSE
you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5
.
But now when you do
sample(1:5, 6)
it results in an error stating
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
which means you are trying to take 6 unique element from 1:5
which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE
you say that elements are allowed to be repeated.
sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1
Here, we see with replace = TRUE
element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.
So now, I hope the concept of replace
is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
This will be TRUE
only when the number of rows in the dataframe is less than 10.
Using reproducible example from mtcars
table(mtcars$cyl)
# 4 6 8
#11 7 14
We see cyl = 6
has 7 rows which is less than 10.
mydata2 <- split(mtcars, mtcars$cyl)
lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])
#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
In the output we can see the rows are repeated only for cyl = 6
and not for anybody else.
answered Mar 25 at 3:09
Ronak ShahRonak Shah
60.6k104679
60.6k104679
add a comment |
add a comment |
Perhaps only use replace=T
when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.
replace - If this is true a sample may contain an element several times while another element might not occur at all.
http://www.rexamples.com/14/Sample()
add a comment |
Perhaps only use replace=T
when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.
replace - If this is true a sample may contain an element several times while another element might not occur at all.
http://www.rexamples.com/14/Sample()
add a comment |
Perhaps only use replace=T
when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.
replace - If this is true a sample may contain an element several times while another element might not occur at all.
http://www.rexamples.com/14/Sample()
Perhaps only use replace=T
when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.
replace - If this is true a sample may contain an element several times while another element might not occur at all.
http://www.rexamples.com/14/Sample()
answered Mar 25 at 3:06
jspcaljspcal
42.3k45465
42.3k45465
add a comment |
add a comment |
3
Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (
replace=TRUE
) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE
)– r2evans
Mar 25 at 3:00