What does the replace mean in sample function? [closed]Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given

Is it possible to install Firefox on Ubuntu with no desktop enviroment?

Are there any rules for identifying what spell an opponent is casting?

Struggling to present results from long papers in short time slots

Is it a bad idea to have an pen name with only an initial for a surname?

Leveraging cash for buying car

Leveling up and Getting Items!

Does anyone recognize these rockets, and their location?

The title "Mord mit Aussicht" explained

How to avoid offending original culture when making conculture inspired from original

Should I email my professor to clear up a (possibly very irrelevant) awkward misunderstanding?

Converting 3x7 to a 1x7. Is it possible with only existing parts?

Why did the USA sell so many airplanes prior to WW2?

How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?

TiKZ won't graph 1/sqrt(x)

SQL Server has encountered occurences of I/O requests taking longer than 15 seconds

What made the Ancient One do this in Endgame?

Interview was just a one hour panel. Got an offer the next day; do I accept or is this a red flag?

Co-worker is now managing my team. Does this mean that I'm being demoted?

How can I improve readability and length of a method with many if statements?

How do credit card companies know what type of business I'm paying for?

Why not make one big CPU core?

Someone who is granted access to information but not expected to read it

At zero velocity, is this object neither speeding up nor slowing down?

Using roof rails to set up hammock



What does the replace mean in sample function? [closed]


Why does this code using random strings print “hello world”?Repeated Sampling Without Replacementdata.table vs dplyr: can one do something well the other can't or does poorly?Random Sample with Replacement LoopRandomly sample from a dataset then delete sampled entries (R)how to cut out a random samples from dataframes in rExtracting indices of a data frame based on samples (with replacement) of a factorR replicate sample function without replacementR: Randomly sampling (with replacement) each column of a data frame independentlyMultistage sampling with R with only final sample size given






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.



I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.



I am not sure how to fix this?



 mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)









share|improve this question













closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
If this question can be reworded to fit the rules in the help center, please edit the question.











  • 3





    Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

    – r2evans
    Mar 25 at 3:00


















1















I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.



I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.



I am not sure how to fix this?



 mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)









share|improve this question













closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
If this question can be reworded to fit the rules in the help center, please edit the question.











  • 3





    Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

    – r2evans
    Mar 25 at 3:00














1












1








1








I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.



I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.



I am not sure how to fix this?



 mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)









share|improve this question














I try to figure out how dose the sample function work when I try to random select 10 samples in each group from the data frame.



I have a data frame with 5 columns and 7000 rows. I split the dataset into around 200 groups. Then I want to random select 10 samples from each group. Some groups have less 10 samples. So when I try to sample them, I set replace=T. However, when I check the output, I found out some groups which were more then 10 sample in the group have repeat samples.



I am not sure how to fix this?



 mydata2<- split(mydf,mydf$Group)
names(mydata2)<-paste0("mydata2",1:length(levels(mydf$Group)))
mysample<-Map(function(x) x[sample((1:nrow(x)),size=10,replace=T),],
mydata2)






r random






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 25 at 2:55









Victor.HVictor.H

797




797




closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
If this question can be reworded to fit the rules in the help center, please edit the question.







closed as off-topic by r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino Mar 25 at 8:28


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "This question was caused by a problem that can no longer be reproduced or a simple typographical error. While similar questions may be on-topic here, this one was resolved in a manner unlikely to help future readers. This can often be avoided by identifying and closely inspecting the shortest program necessary to reproduce the problem before posting." – r2evans, andrewdotnich, thewaywewere, Matteo Baldi, Rekshino
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 3





    Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

    – r2evans
    Mar 25 at 3:00













  • 3





    Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

    – r2evans
    Mar 25 at 3:00








3




3





Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00






Googling "sample with replacement" produces countless applicable explanations, since it is a common and critical concept in statistics. Premise: when you "sample" something, does what you pull out go back into the "population" allowing it to be picked again? (replace=TRUE) Or does pulling it out randomly mean it is no longer available for future samples? (replace=FALSE)

– r2evans
Mar 25 at 3:00













2 Answers
2






active

oldest

votes


















3














Let's first understand the concept of replace.



By default replace is FALSE in sample. So when you do



sample(1:5, 2)
#[1] 4 1


As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.



But now when you do



sample(1:5, 6)


it results in an error stating




Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'




which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.



sample(1:5, 6, replace = TRUE)
#[1] 3 3 5 3 1 1


Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.




So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,



lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


This will be TRUE only when the number of rows in the dataframe is less than 10.



Using reproducible example from mtcars



table(mtcars$cyl)

# 4 6 8
#11 7 14


We see cyl = 6 has 7 rows which is less than 10.



mydata2 <- split(mtcars, mtcars$cyl)

lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


#$`4`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

#$`6`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

#$`8`
# mpg cyl disp hp drat wt qsec vs am gear carb
#Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4


In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.






share|improve this answer






























    2














    Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.



    replace - If this is true a sample may contain an element several times while another element might not occur at all.



    http://www.rexamples.com/14/Sample()






    share|improve this answer





























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3














      Let's first understand the concept of replace.



      By default replace is FALSE in sample. So when you do



      sample(1:5, 2)
      #[1] 4 1


      As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.



      But now when you do



      sample(1:5, 6)


      it results in an error stating




      Error in sample.int(length(x), size, replace, prob) :
      cannot take a sample larger than the population when 'replace = FALSE'




      which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.



      sample(1:5, 6, replace = TRUE)
      #[1] 3 3 5 3 1 1


      Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.




      So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,



      lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


      This will be TRUE only when the number of rows in the dataframe is less than 10.



      Using reproducible example from mtcars



      table(mtcars$cyl)

      # 4 6 8
      #11 7 14


      We see cyl = 6 has 7 rows which is less than 10.



      mydata2 <- split(mtcars, mtcars$cyl)

      lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


      #$`4`
      # mpg cyl disp hp drat wt qsec vs am gear carb
      #Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
      #Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
      #Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
      #Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
      #Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
      #Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
      #Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
      #Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
      #Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
      #Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

      #$`6`
      # mpg cyl disp hp drat wt qsec vs am gear carb
      #Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
      #Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
      #Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
      #Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
      #Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
      #Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
      #Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
      #Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
      #Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
      #Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

      #$`8`
      # mpg cyl disp hp drat wt qsec vs am gear carb
      #Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
      #Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
      #Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
      #Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
      #AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
      #Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
      #Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
      #Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
      #Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
      #Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4


      In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.






      share|improve this answer



























        3














        Let's first understand the concept of replace.



        By default replace is FALSE in sample. So when you do



        sample(1:5, 2)
        #[1] 4 1


        As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.



        But now when you do



        sample(1:5, 6)


        it results in an error stating




        Error in sample.int(length(x), size, replace, prob) :
        cannot take a sample larger than the population when 'replace = FALSE'




        which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.



        sample(1:5, 6, replace = TRUE)
        #[1] 3 3 5 3 1 1


        Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.




        So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,



        lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


        This will be TRUE only when the number of rows in the dataframe is less than 10.



        Using reproducible example from mtcars



        table(mtcars$cyl)

        # 4 6 8
        #11 7 14


        We see cyl = 6 has 7 rows which is less than 10.



        mydata2 <- split(mtcars, mtcars$cyl)

        lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


        #$`4`
        # mpg cyl disp hp drat wt qsec vs am gear carb
        #Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
        #Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
        #Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
        #Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
        #Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
        #Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
        #Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
        #Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
        #Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
        #Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

        #$`6`
        # mpg cyl disp hp drat wt qsec vs am gear carb
        #Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
        #Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
        #Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
        #Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
        #Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
        #Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
        #Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
        #Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
        #Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
        #Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

        #$`8`
        # mpg cyl disp hp drat wt qsec vs am gear carb
        #Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
        #Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
        #Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
        #Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
        #AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
        #Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
        #Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
        #Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
        #Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
        #Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4


        In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.






        share|improve this answer

























          3












          3








          3







          Let's first understand the concept of replace.



          By default replace is FALSE in sample. So when you do



          sample(1:5, 2)
          #[1] 4 1


          As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.



          But now when you do



          sample(1:5, 6)


          it results in an error stating




          Error in sample.int(length(x), size, replace, prob) :
          cannot take a sample larger than the population when 'replace = FALSE'




          which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.



          sample(1:5, 6, replace = TRUE)
          #[1] 3 3 5 3 1 1


          Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.




          So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,



          lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


          This will be TRUE only when the number of rows in the dataframe is less than 10.



          Using reproducible example from mtcars



          table(mtcars$cyl)

          # 4 6 8
          #11 7 14


          We see cyl = 6 has 7 rows which is less than 10.



          mydata2 <- split(mtcars, mtcars$cyl)

          lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


          #$`4`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
          #Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
          #Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
          #Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
          #Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
          #Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
          #Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
          #Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
          #Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
          #Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

          #$`6`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
          #Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
          #Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
          #Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
          #Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
          #Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
          #Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

          #$`8`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
          #Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
          #Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
          #Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
          #AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
          #Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
          #Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
          #Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
          #Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
          #Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4


          In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.






          share|improve this answer













          Let's first understand the concept of replace.



          By default replace is FALSE in sample. So when you do



          sample(1:5, 2)
          #[1] 4 1


          As replace is default set to FALSE you are not allowed to repeat elements. It will randomly select 2 unique elements from 1:5.



          But now when you do



          sample(1:5, 6)


          it results in an error stating




          Error in sample.int(length(x), size, replace, prob) :
          cannot take a sample larger than the population when 'replace = FALSE'




          which means you are trying to take 6 unique element from 1:5 which is not possible because you have only 5 unique elements in them. So when you set replace = TRUE you say that elements are allowed to be repeated.



          sample(1:5, 6, replace = TRUE)
          #[1] 3 3 5 3 1 1


          Here, we see with replace = TRUE element 3 is repeated whereas in first example how many times you run it, it will never repeat elements.




          So now, I hope the concept of replace is clear to you. Now you want to repeat rows only when there are less than 10 rows in the list. So you can add a condition accordingly,



          lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


          This will be TRUE only when the number of rows in the dataframe is less than 10.



          Using reproducible example from mtcars



          table(mtcars$cyl)

          # 4 6 8
          #11 7 14


          We see cyl = 6 has 7 rows which is less than 10.



          mydata2 <- split(mtcars, mtcars$cyl)

          lapply(mydata2, function(x) x[sample(1:nrow(x) ,size=10, replace=nrow(x) < 10), ])


          #$`4`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
          #Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
          #Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
          #Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
          #Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
          #Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
          #Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
          #Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
          #Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
          #Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

          #$`6`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
          #Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
          #Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
          #Mazda RX4.1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
          #Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
          #Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
          #Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
          #Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

          #$`8`
          # mpg cyl disp hp drat wt qsec vs am gear carb
          #Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
          #Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
          #Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
          #Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
          #AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
          #Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
          #Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
          #Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
          #Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
          #Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4


          In the output we can see the rows are repeated only for cyl = 6 and not for anybody else.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 25 at 3:09









          Ronak ShahRonak Shah

          60.6k104679




          60.6k104679























              2














              Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.



              replace - If this is true a sample may contain an element several times while another element might not occur at all.



              http://www.rexamples.com/14/Sample()






              share|improve this answer



























                2














                Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.



                replace - If this is true a sample may contain an element several times while another element might not occur at all.



                http://www.rexamples.com/14/Sample()






                share|improve this answer

























                  2












                  2








                  2







                  Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.



                  replace - If this is true a sample may contain an element several times while another element might not occur at all.



                  http://www.rexamples.com/14/Sample()






                  share|improve this answer













                  Perhaps only use replace=T when the group is smaller than the minimum size? This flag allows duplicates in the sample resulting in the behavior you're observing.



                  replace - If this is true a sample may contain an element several times while another element might not occur at all.



                  http://www.rexamples.com/14/Sample()







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 25 at 3:06









                  jspcaljspcal

                  42.3k45465




                  42.3k45465













                      Popular posts from this blog

                      Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                      Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                      Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript