What does the replace mean in sample function? [closed]Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given

Is it possible to install Firefox on Ubuntu with no desktop enviroment?

Are there any rules for identifying what spell an opponent is casting?

Struggling to present results from long papers in short time slots

Is it a bad idea to have an pen name with only an initial for a surname?

Leveraging cash for buying car

Leveling up and Getting Items!

Does anyone recognize these rockets, and their location?

The title "Mord mit Aussicht" explained

How to avoid offending original culture when making conculture inspired from original

Should I email my professor to clear up a (possibly very irrelevant) awkward misunderstanding?

Converting 3x7 to a 1x7. Is it possible with only existing parts?

Why did the USA sell so many airplanes prior to WW2?

How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?

TiKZ won't graph 1/sqrt(x)

SQL Server has encountered occurences of I/O requests taking longer than 15 seconds

What made the Ancient One do this in Endgame?

Interview was just a one hour panel. Got an offer the next day; do I accept or is this a red flag?

Co-worker is now managing my team. Does this mean that I'm being demoted?

How can I improve readability and length of a method with many if statements?

How do credit card companies know what type of business I'm paying for?

Why not make one big CPU core?

Someone who is granted access to information but not expected to read it

At zero velocity, is this object neither speeding up nor slowing down?

Using roof rails to set up hammock

What does the replace mean in sample function? [closed]

Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.

I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.

I am not sure how to fix this?

 mydata2<- split(mydf,mydf$Group)
 names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group))) 
 mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),], 
 mydata2)

asked Mar 25 at 2:55

Victor.H

797

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00

add a comment |

I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.

I am not sure how to fix this?

 mydata2<- split(mydf,mydf$Group)
 names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group))) 
 mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),], 
 mydata2)

asked Mar 25 at 2:55

Victor.H

797

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00

add a comment |

I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.

I am not sure how to fix this?

 mydata2<- split(mydf,mydf$Group)
 names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group))) 
 mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),], 
 mydata2)

asked Mar 25 at 2:55

Victor.H

797

I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.

I am not sure how to fix this?

 mydata2<- split(mydf,mydf$Group)
 names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group))) 
 mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),], 
 mydata2)

r random

asked Mar 25 at 2:55

Victor.H

797

asked Mar 25 at 2:55

Victor.H

797

asked Mar 25 at 2:55

Victor.H

797

asked Mar 25 at 2:55

Victor.H

797

asked Mar 25 at 2:55

Victor.H

797

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino

If this question can be reworded to fit the rules in the help center, please edit the question.

3

Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00

add a comment |

3

Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00

Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00

add a comment |

2 Answers
2

active

oldest

votes

Let's first understand the concept of replace.

By default replace is FALSE in sample. So when you do

sample(1:5, 2)
#[1] 4 1

As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.

But now when you do

sample(1:5, 6)

it results in an error stating

Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.

sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1

Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.

So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])

This will be TRUE only when the number of rows in the dataframe is less than 10.

Using reproducible example from mtcars

table(mtcars$cyl)

# 4 6 8 
#11 7 14

We see cyl = 6 has 7 rows which is less than 10.

mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

add a comment |

Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.

replace - If this is true a sample may contain an element several times while another element might not occur at all.

http://www.rexamples.com/14/Sample()

answered Mar 25 at 3:06

jspcal

42.3k45465

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Let's first understand the concept of replace.

By default replace is FALSE in sample. So when you do

sample(1:5, 2)
#[1] 4 1

As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.

But now when you do

sample(1:5, 6)

it results in an error stating

Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1

Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.

So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])

This will be TRUE only when the number of rows in the dataframe is less than 10.

Using reproducible example from mtcars

table(mtcars$cyl)

# 4 6 8 
#11 7 14

We see cyl = 6 has 7 rows which is less than 10.

mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

add a comment |

Let's first understand the concept of replace.

By default replace is FALSE in sample. So when you do

sample(1:5, 2)
#[1] 4 1

As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.

But now when you do

sample(1:5, 6)

it results in an error stating

Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1

Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.

So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])

This will be TRUE only when the number of rows in the dataframe is less than 10.

Using reproducible example from mtcars

table(mtcars$cyl)

# 4 6 8 
#11 7 14

We see cyl = 6 has 7 rows which is less than 10.

mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

add a comment |

Let's first understand the concept of replace.

By default replace is FALSE in sample. So when you do

sample(1:5, 2)
#[1] 4 1

As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.

But now when you do

sample(1:5, 6)

it results in an error stating

Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1

Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.

So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])

This will be TRUE only when the number of rows in the dataframe is less than 10.

Using reproducible example from mtcars

table(mtcars$cyl)

# 4 6 8 
#11 7 14

We see cyl = 6 has 7 rows which is less than 10.

mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

Let's first understand the concept of replace.

By default replace is FALSE in sample. So when you do

sample(1:5, 2)
#[1] 4 1

As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.

But now when you do

sample(1:5, 6)

it results in an error stating

Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'

sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1

Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.

So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])

This will be TRUE only when the number of rows in the dataframe is less than 10.

Using reproducible example from mtcars

table(mtcars$cyl)

# 4 6 8 
#11 7 14

We see cyl = 6 has 7 rows which is less than 10.

mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

answered Mar 25 at 3:09

Ronak Shah

60.6k104679

add a comment |

Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.

replace - If this is true a sample may contain an element several times while another element might not occur at all.

http://www.rexamples.com/14/Sample()

answered Mar 25 at 3:06

jspcal

42.3k45465

add a comment |

Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.

replace - If this is true a sample may contain an element several times while another element might not occur at all.

http://www.rexamples.com/14/Sample()

answered Mar 25 at 3:06

jspcal

42.3k45465

add a comment |

Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.

replace - If this is true a sample may contain an element several times while another element might not occur at all.

http://www.rexamples.com/14/Sample()

answered Mar 25 at 3:06

jspcal

42.3k45465

Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.

replace - If this is true a sample may contain an element several times while another element might not occur at all.

http://www.rexamples.com/14/Sample()

answered Mar 25 at 3:06

jspcal

42.3k45465

answered Mar 25 at 3:06

jspcal

42.3k45465

answered Mar 25 at 3:06

jspcal

42.3k45465

answered Mar 25 at 3:06

jspcal

42.3k45465

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

2 Answers
2

2 Answers
2

2 Answers
2

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28

2 Answers 2

2 Answers 2

2 Answers 2

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

2 Answers
2

2 Answers
2

2 Answers
2