How to randomly split data into three equal sizes?Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes

graphs in latex

What can Amex do if I cancel their card after using the sign up bonus miles?

Units of measurement, especially length, when body parts vary in size among races

Should I leave building the database for the end?

Escape Velocity - Won't the orbital path just become larger with higher initial velocity?

Why is there a large performance impact when looping over an array with 240 or more elements?

Why aren’t there water shutoff valves for each room?

Help, I cannot decide when to start the story

Locked Room Murder!! How and who?

Did DOS zero out the BSS area when it loaded a program?

Word for an event that will likely never happen again

Global BGP Routing only by only importing supernet prefixes

Why does the cable resistance jump from a low value to high value at a particular frequency?

(A room / an office) where an artist works

How do I ask for 2-3 days per week remote work in a job interview?

Does an Irish VISA WARNING count as "refused entry at the border of any country other than the UK?"

What would it take to get a message to another star?

Is this n-speak?

If a person claims to know anything could it be disproven by saying 'prove that we are not in a simulation'?

Go to last file in vim

Why aren't rainbows blurred-out into nothing after they are produced?

Cycle of actions and voice signals on a multipitch climb

Is there a way to proportionalize fixed costs in a MILP?

Boss wants me to ignore a software API license



How to randomly split data into three equal sizes?


Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3 becomes Project1, Project2 and Project3.



I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/



I have made an example of my dataset looking like this:



ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)


And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values



ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186









share|improve this question


























  • when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

    – Hojo.Timberwolf
    Mar 27 at 11:15











  • Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

    – jay.sf
    Mar 27 at 11:18











  • @Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • @jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • This question needs to be simply modified and asked in a better way to be useful for others too!

    – Majid
    Mar 27 at 11:31

















2















I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3 becomes Project1, Project2 and Project3.



I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/



I have made an example of my dataset looking like this:



ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)


And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values



ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186









share|improve this question


























  • when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

    – Hojo.Timberwolf
    Mar 27 at 11:15











  • Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

    – jay.sf
    Mar 27 at 11:18











  • @Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • @jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • This question needs to be simply modified and asked in a better way to be useful for others too!

    – Majid
    Mar 27 at 11:31













2












2








2


1






I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3 becomes Project1, Project2 and Project3.



I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/



I have made an example of my dataset looking like this:



ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)


And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values



ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186









share|improve this question
















I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3 becomes Project1, Project2 and Project3.



I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/



I have made an example of my dataset looking like this:



ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)


And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values



ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186






r random group-by dplyr divide






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 27 at 11:43









kath

5,71411 silver badges27 bronze badges




5,71411 silver badges27 bronze badges










asked Mar 27 at 11:11









Rose Nonglak Seesan JensenRose Nonglak Seesan Jensen

256 bronze badges




256 bronze badges















  • when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

    – Hojo.Timberwolf
    Mar 27 at 11:15











  • Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

    – jay.sf
    Mar 27 at 11:18











  • @Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • @jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • This question needs to be simply modified and asked in a better way to be useful for others too!

    – Majid
    Mar 27 at 11:31

















  • when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

    – Hojo.Timberwolf
    Mar 27 at 11:15











  • Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

    – jay.sf
    Mar 27 at 11:18











  • @Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • @jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 11:23











  • This question needs to be simply modified and asked in a better way to be useful for others too!

    – Majid
    Mar 27 at 11:31
















when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15





when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15













Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18





Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18













@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23





@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23













@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23





@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23













This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31





This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31












4 Answers
4






active

oldest

votes


















2














IMO it should be sufficient to assign just random project names.



dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
labels=paste0("Project", 1:3)))


Result



head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186


Data



set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))





share|improve this answer






















  • 1





    Thank you Jay :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26











  • You're very welcome @RoseNonglakSeesanJensen.

    – jay.sf
    Mar 27 at 12:30


















2














Add an id to data:



data$id <- 1:nrow(data)


Take the first sample:



project1 <- dplyr::sample_frac(data, 0.33333)


Remove the used rows from data and save into project2:



project2 <- data[!(data$id %in% project1$id), ]


Sample half of the remainder:



project3 <- dplyr::sample_frac(project2, 0.5)


Finally remove those in the project3 sample from project2:



project2 <- project2[!(project2$id %in% project3$id), ]


Check all ids are unique:



# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)


And double-check the data frames have the right number of cases:



nrow(project1)
nrow(project2)
nrow(project3)





share|improve this answer

























  • Thank you so much Phil :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26


















2














I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.



sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]





share|improve this answer

























  • Thank you so much for this! It works :D

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26


















1














I like the solution in this comment to a Github gist.



You could generate the indices as suggested:



folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))


Then get a list of 3 equal size data frames using:



datalist <- lapply(folds, function(x) data[x, ])





share|improve this answer



























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55375807%2fhow-to-randomly-split-data-into-three-equal-sizes%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    IMO it should be sufficient to assign just random project names.



    dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
    labels=paste0("Project", 1:3)))


    Result



    head(dat)
    # X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
    # 1 1 1 0 1 1 1 1 0 1 0 Project1
    # 2 1 1 1 1 1 1 0 0 1 0 Project1
    # 3 0 0 1 1 0 0 0 1 1 1 Project1
    # 4 1 1 1 0 1 0 1 1 0 1 Project3
    # 5 1 0 0 1 1 1 1 0 0 1 Project1
    # 6 1 0 0 0 0 1 0 1 1 1 Project3

    table(dat$ProjectName)
    # Project1 Project2 Project3
    # 3186 3186 3186


    Data



    set.seed(42)
    dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))





    share|improve this answer






















    • 1





      Thank you Jay :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26











    • You're very welcome @RoseNonglakSeesanJensen.

      – jay.sf
      Mar 27 at 12:30















    2














    IMO it should be sufficient to assign just random project names.



    dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
    labels=paste0("Project", 1:3)))


    Result



    head(dat)
    # X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
    # 1 1 1 0 1 1 1 1 0 1 0 Project1
    # 2 1 1 1 1 1 1 0 0 1 0 Project1
    # 3 0 0 1 1 0 0 0 1 1 1 Project1
    # 4 1 1 1 0 1 0 1 1 0 1 Project3
    # 5 1 0 0 1 1 1 1 0 0 1 Project1
    # 6 1 0 0 0 0 1 0 1 1 1 Project3

    table(dat$ProjectName)
    # Project1 Project2 Project3
    # 3186 3186 3186


    Data



    set.seed(42)
    dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))





    share|improve this answer






















    • 1





      Thank you Jay :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26











    • You're very welcome @RoseNonglakSeesanJensen.

      – jay.sf
      Mar 27 at 12:30













    2












    2








    2







    IMO it should be sufficient to assign just random project names.



    dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
    labels=paste0("Project", 1:3)))


    Result



    head(dat)
    # X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
    # 1 1 1 0 1 1 1 1 0 1 0 Project1
    # 2 1 1 1 1 1 1 0 0 1 0 Project1
    # 3 0 0 1 1 0 0 0 1 1 1 Project1
    # 4 1 1 1 0 1 0 1 1 0 1 Project3
    # 5 1 0 0 1 1 1 1 0 0 1 Project1
    # 6 1 0 0 0 0 1 0 1 1 1 Project3

    table(dat$ProjectName)
    # Project1 Project2 Project3
    # 3186 3186 3186


    Data



    set.seed(42)
    dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))





    share|improve this answer















    IMO it should be sufficient to assign just random project names.



    dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
    labels=paste0("Project", 1:3)))


    Result



    head(dat)
    # X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
    # 1 1 1 0 1 1 1 1 0 1 0 Project1
    # 2 1 1 1 1 1 1 0 0 1 0 Project1
    # 3 0 0 1 1 0 0 0 1 1 1 Project1
    # 4 1 1 1 0 1 0 1 1 0 1 Project3
    # 5 1 0 0 1 1 1 1 0 0 1 Project1
    # 6 1 0 0 0 0 1 0 1 1 1 Project3

    table(dat$ProjectName)
    # Project1 Project2 Project3
    # 3186 3186 3186


    Data



    set.seed(42)
    dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 29 at 5:45

























    answered Mar 27 at 11:50









    jay.sfjay.sf

    10.6k3 gold badges21 silver badges45 bronze badges




    10.6k3 gold badges21 silver badges45 bronze badges










    • 1





      Thank you Jay :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26











    • You're very welcome @RoseNonglakSeesanJensen.

      – jay.sf
      Mar 27 at 12:30












    • 1





      Thank you Jay :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26











    • You're very welcome @RoseNonglakSeesanJensen.

      – jay.sf
      Mar 27 at 12:30







    1




    1





    Thank you Jay :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26





    Thank you Jay :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26













    You're very welcome @RoseNonglakSeesanJensen.

    – jay.sf
    Mar 27 at 12:30





    You're very welcome @RoseNonglakSeesanJensen.

    – jay.sf
    Mar 27 at 12:30













    2














    Add an id to data:



    data$id <- 1:nrow(data)


    Take the first sample:



    project1 <- dplyr::sample_frac(data, 0.33333)


    Remove the used rows from data and save into project2:



    project2 <- data[!(data$id %in% project1$id), ]


    Sample half of the remainder:



    project3 <- dplyr::sample_frac(project2, 0.5)


    Finally remove those in the project3 sample from project2:



    project2 <- project2[!(project2$id %in% project3$id), ]


    Check all ids are unique:



    # should all be FALSE
    any(project1$id %in% project2$id)
    any(project1$id %in% project3$id)
    any(project2$id %in% project3$id)


    And double-check the data frames have the right number of cases:



    nrow(project1)
    nrow(project2)
    nrow(project3)





    share|improve this answer

























    • Thank you so much Phil :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26















    2














    Add an id to data:



    data$id <- 1:nrow(data)


    Take the first sample:



    project1 <- dplyr::sample_frac(data, 0.33333)


    Remove the used rows from data and save into project2:



    project2 <- data[!(data$id %in% project1$id), ]


    Sample half of the remainder:



    project3 <- dplyr::sample_frac(project2, 0.5)


    Finally remove those in the project3 sample from project2:



    project2 <- project2[!(project2$id %in% project3$id), ]


    Check all ids are unique:



    # should all be FALSE
    any(project1$id %in% project2$id)
    any(project1$id %in% project3$id)
    any(project2$id %in% project3$id)


    And double-check the data frames have the right number of cases:



    nrow(project1)
    nrow(project2)
    nrow(project3)





    share|improve this answer

























    • Thank you so much Phil :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26













    2












    2








    2







    Add an id to data:



    data$id <- 1:nrow(data)


    Take the first sample:



    project1 <- dplyr::sample_frac(data, 0.33333)


    Remove the used rows from data and save into project2:



    project2 <- data[!(data$id %in% project1$id), ]


    Sample half of the remainder:



    project3 <- dplyr::sample_frac(project2, 0.5)


    Finally remove those in the project3 sample from project2:



    project2 <- project2[!(project2$id %in% project3$id), ]


    Check all ids are unique:



    # should all be FALSE
    any(project1$id %in% project2$id)
    any(project1$id %in% project3$id)
    any(project2$id %in% project3$id)


    And double-check the data frames have the right number of cases:



    nrow(project1)
    nrow(project2)
    nrow(project3)





    share|improve this answer













    Add an id to data:



    data$id <- 1:nrow(data)


    Take the first sample:



    project1 <- dplyr::sample_frac(data, 0.33333)


    Remove the used rows from data and save into project2:



    project2 <- data[!(data$id %in% project1$id), ]


    Sample half of the remainder:



    project3 <- dplyr::sample_frac(project2, 0.5)


    Finally remove those in the project3 sample from project2:



    project2 <- project2[!(project2$id %in% project3$id), ]


    Check all ids are unique:



    # should all be FALSE
    any(project1$id %in% project2$id)
    any(project1$id %in% project3$id)
    any(project2$id %in% project3$id)


    And double-check the data frames have the right number of cases:



    nrow(project1)
    nrow(project2)
    nrow(project3)






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 27 at 11:21









    PhilPhil

    3,1981 gold badge15 silver badges29 bronze badges




    3,1981 gold badge15 silver badges29 bronze badges















    • Thank you so much Phil :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26

















    • Thank you so much Phil :)

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26
















    Thank you so much Phil :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26





    Thank you so much Phil :)

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26











    2














    I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.



    sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

    data$grp <- 0
    data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

    project1 <- data[data$grp %in% 1 ,]
    project2 <- data[data$grp %in% 2 ,]
    project3 <- data[data$grp %in% 3 ,]





    share|improve this answer

























    • Thank you so much for this! It works :D

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26















    2














    I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.



    sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

    data$grp <- 0
    data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

    project1 <- data[data$grp %in% 1 ,]
    project2 <- data[data$grp %in% 2 ,]
    project3 <- data[data$grp %in% 3 ,]





    share|improve this answer

























    • Thank you so much for this! It works :D

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26













    2












    2








    2







    I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.



    sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

    data$grp <- 0
    data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

    project1 <- data[data$grp %in% 1 ,]
    project2 <- data[data$grp %in% 2 ,]
    project3 <- data[data$grp %in% 3 ,]





    share|improve this answer













    I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.



    sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

    data$grp <- 0
    data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

    project1 <- data[data$grp %in% 1 ,]
    project2 <- data[data$grp %in% 2 ,]
    project3 <- data[data$grp %in% 3 ,]






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 27 at 11:26









    MatthewRMatthewR

    1,1611 gold badge13 silver badges22 bronze badges




    1,1611 gold badge13 silver badges22 bronze badges















    • Thank you so much for this! It works :D

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26

















    • Thank you so much for this! It works :D

      – Rose Nonglak Seesan Jensen
      Mar 27 at 12:26
















    Thank you so much for this! It works :D

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26





    Thank you so much for this! It works :D

    – Rose Nonglak Seesan Jensen
    Mar 27 at 12:26











    1














    I like the solution in this comment to a Github gist.



    You could generate the indices as suggested:



    folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))


    Then get a list of 3 equal size data frames using:



    datalist <- lapply(folds, function(x) data[x, ])





    share|improve this answer





























      1














      I like the solution in this comment to a Github gist.



      You could generate the indices as suggested:



      folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))


      Then get a list of 3 equal size data frames using:



      datalist <- lapply(folds, function(x) data[x, ])





      share|improve this answer



























        1












        1








        1







        I like the solution in this comment to a Github gist.



        You could generate the indices as suggested:



        folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))


        Then get a list of 3 equal size data frames using:



        datalist <- lapply(folds, function(x) data[x, ])





        share|improve this answer













        I like the solution in this comment to a Github gist.



        You could generate the indices as suggested:



        folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))


        Then get a list of 3 equal size data frames using:



        datalist <- lapply(folds, function(x) data[x, ])






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 27 at 11:41









        neilfwsneilfws

        20.6k5 gold badges38 silver badges49 bronze badges




        20.6k5 gold badges38 silver badges49 bronze badges






























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55375807%2fhow-to-randomly-split-data-into-three-equal-sizes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript