How to randomly split data into three equal sizes?Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes

graphs in latex

What can Amex do if I cancel their card after using the sign up bonus miles?

Units of measurement, especially length, when body parts vary in size among races

Should I leave building the database for the end?

Escape Velocity - Won't the orbital path just become larger with higher initial velocity?

Why is there a large performance impact when looping over an array with 240 or more elements?

Why aren’t there water shutoff valves for each room?

Help, I cannot decide when to start the story

Locked Room Murder!! How and who?

Did DOS zero out the BSS area when it loaded a program?

Word for an event that will likely never happen again

Global BGP Routing only by only importing supernet prefixes

Why does the cable resistance jump from a low value to high value at a particular frequency?

(A room / an office) where an artist works

How do I ask for 2-3 days per week remote work in a job interview?

Does an Irish VISA WARNING count as "refused entry at the border of any country other than the UK?"

What would it take to get a message to another star?

Is this n-speak?

If a person claims to know anything could it be disproven by saying 'prove that we are not in a simulation'?

Go to last file in vim

Why aren't rainbows blurred-out into nothing after they are produced?

Cycle of actions and voice signals on a multipitch climb

Is there a way to proportionalize fixed costs in a MILP?

Boss wants me to ignore a software API license

How to randomly split data into three equal sizes?

Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3 becomes Project1, Project2 and Project3.

I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/

I have made an example of my dataset looking like this:

ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)

And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values

ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15

Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18

@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31

add a comment |

I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/

I have made an example of my dataset looking like this:

ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)

And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values

ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15

Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18

@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31

add a comment |

I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/

I have made an example of my dataset looking like this:

ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)

And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values

ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n() and sample_frac(), but unfortunately I can't solve this issue myself :/

I have made an example of my dataset looking like this:

ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)

And the output should be randomly split in three equal group of nrow=3186 and then assigned to the values

ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186

r random group-by dplyr divide

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

edited Mar 27 at 11:43

kath

5,71411 silver badges27 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

asked Mar 27 at 11:11

Rose Nonglak Seesan Jensen

256 bronze badges

when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15

Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18

@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31

add a comment |

when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15

Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18

@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31

when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set

– Hojo.Timberwolf
Mar 27 at 11:15

Does c("Project1", "Project2", "Project3") instead of c("Project1_Project2_Project3") give you what you want?

– jay.sf
Mar 27 at 11:18

@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)

– Rose Nonglak Seesan Jensen
Mar 27 at 11:23

This question needs to be simply modified and asked in a better way to be useful for others too!

– Majid
Mar 27 at 11:31

add a comment |

4 Answers
4

active

oldest

votes

IMO it should be sufficient to assign just random project names.

dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
 labels=paste0("Project", 1:3)))

Result

head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3 
# 3186 3186 3186

Data

set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

1

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

add a comment |

Add an id to data:

data$id <- 1:nrow(data)

Take the first sample:

project1 <- dplyr::sample_frac(data, 0.33333)

Remove the used rows from data and save into project2:

project2 <- data[!(data$id %in% project1$id), ]

Sample half of the remainder:

project3 <- dplyr::sample_frac(project2, 0.5)

Finally remove those in the project3 sample from project2:

project2 <- project2[!(project2$id %in% project3$id), ]

Check all ids are unique:

# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)

And double-check the data frames have the right number of cases:

nrow(project1)
nrow(project2)
nrow(project3)

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.

sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I like the solution in this comment to a Github gist.

You could generate the indices as suggested:

folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))

Then get a list of 3 equal size data frames using:

datalist <- lapply(folds, function(x) data[x, ])

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55375807%2fhow-to-randomly-split-data-into-three-equal-sizes%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

IMO it should be sufficient to assign just random project names.

dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
 labels=paste0("Project", 1:3)))

Result

head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3 
# 3186 3186 3186

Data

set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

1

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

add a comment |

IMO it should be sufficient to assign just random project names.

dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
 labels=paste0("Project", 1:3)))

Result

head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3 
# 3186 3186 3186

Data

set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

1

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

add a comment |

IMO it should be sufficient to assign just random project names.

dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
 labels=paste0("Project", 1:3)))

Result

head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3 
# 3186 3186 3186

Data

set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

IMO it should be sufficient to assign just random project names.

dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)), 
 labels=paste0("Project", 1:3)))

Result

head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3

table(dat$ProjectName)
# Project1 Project2 Project3 
# 3186 3186 3186

Data

set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

edited Mar 29 at 5:45

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

answered Mar 27 at 11:50

jay.sf

10.6k3 gold badges21 silver badges45 bronze badges

1

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

add a comment |

1

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

Thank you Jay :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

You're very welcome @RoseNonglakSeesanJensen.

– jay.sf
Mar 27 at 12:30

add a comment |

Add an id to data:

data$id <- 1:nrow(data)

Take the first sample:

project1 <- dplyr::sample_frac(data, 0.33333)

Remove the used rows from data and save into project2:

project2 <- data[!(data$id %in% project1$id), ]

Sample half of the remainder:

project3 <- dplyr::sample_frac(project2, 0.5)

Finally remove those in the project3 sample from project2:

project2 <- project2[!(project2$id %in% project3$id), ]

Check all ids are unique:

# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)

And double-check the data frames have the right number of cases:

nrow(project1)
nrow(project2)
nrow(project3)

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

Add an id to data:

data$id <- 1:nrow(data)

Take the first sample:

project1 <- dplyr::sample_frac(data, 0.33333)

Remove the used rows from data and save into project2:

project2 <- data[!(data$id %in% project1$id), ]

Sample half of the remainder:

project3 <- dplyr::sample_frac(project2, 0.5)

Finally remove those in the project3 sample from project2:

project2 <- project2[!(project2$id %in% project3$id), ]

Check all ids are unique:

# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)

And double-check the data frames have the right number of cases:

nrow(project1)
nrow(project2)
nrow(project3)

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

Add an id to data:

data$id <- 1:nrow(data)

Take the first sample:

project1 <- dplyr::sample_frac(data, 0.33333)

Remove the used rows from data and save into project2:

project2 <- data[!(data$id %in% project1$id), ]

Sample half of the remainder:

project3 <- dplyr::sample_frac(project2, 0.5)

Finally remove those in the project3 sample from project2:

project2 <- project2[!(project2$id %in% project3$id), ]

Check all ids are unique:

# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)

And double-check the data frames have the right number of cases:

nrow(project1)
nrow(project2)
nrow(project3)

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

Add an id to data:

data$id <- 1:nrow(data)

Take the first sample:

project1 <- dplyr::sample_frac(data, 0.33333)

Remove the used rows from data and save into project2:

project2 <- data[!(data$id %in% project1$id), ]

Sample half of the remainder:

project3 <- dplyr::sample_frac(project2, 0.5)

Finally remove those in the project3 sample from project2:

project2 <- project2[!(project2$id %in% project3$id), ]

Check all ids are unique:

# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)

And double-check the data frames have the right number of cases:

nrow(project1)
nrow(project2)
nrow(project3)

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

answered Mar 27 at 11:21

Phil

3,1981 gold badge15 silver badges29 bronze badges

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

Thank you so much Phil :)

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.

sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.

sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.

sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.

sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) ) 

data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )

project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

answered Mar 27 at 11:26

MatthewR

1,1611 gold badge13 silver badges22 bronze badges

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

Thank you so much for this! It works :D

– Rose Nonglak Seesan Jensen
Mar 27 at 12:26

add a comment |

I like the solution in this comment to a Github gist.

You could generate the indices as suggested:

folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))

Then get a list of 3 equal size data frames using:

datalist <- lapply(folds, function(x) data[x, ])

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

add a comment |

I like the solution in this comment to a Github gist.

You could generate the indices as suggested:

folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))

Then get a list of 3 equal size data frames using:

datalist <- lapply(folds, function(x) data[x, ])

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

add a comment |

I like the solution in this comment to a Github gist.

You could generate the indices as suggested:

folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))

Then get a list of 3 equal size data frames using:

datalist <- lapply(folds, function(x) data[x, ])

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

I like the solution in this comment to a Github gist.

You could generate the indices as suggested:

folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))

Then get a list of 3 equal size data frames using:

datalist <- lapply(folds, function(x) data[x, ])

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

answered Mar 27 at 11:41

neilfws

20.6k5 gold badges38 silver badges49 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers
4

4 Answers
4

4 Answers
4