How to randomly split data into three equal sizes?Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes
graphs in latex
What can Amex do if I cancel their card after using the sign up bonus miles?
Units of measurement, especially length, when body parts vary in size among races
Should I leave building the database for the end?
Escape Velocity - Won't the orbital path just become larger with higher initial velocity?
Why is there a large performance impact when looping over an array with 240 or more elements?
Why aren’t there water shutoff valves for each room?
Help, I cannot decide when to start the story
Locked Room Murder!! How and who?
Did DOS zero out the BSS area when it loaded a program?
Word for an event that will likely never happen again
Global BGP Routing only by only importing supernet prefixes
Why does the cable resistance jump from a low value to high value at a particular frequency?
(A room / an office) where an artist works
How do I ask for 2-3 days per week remote work in a job interview?
Does an Irish VISA WARNING count as "refused entry at the border of any country other than the UK?"
What would it take to get a message to another star?
Is this n-speak?
If a person claims to know anything could it be disproven by saying 'prove that we are not in a simulation'?
Go to last file in vim
Why aren't rainbows blurred-out into nothing after they are produced?
Cycle of actions and voice signals on a multipitch climb
Is there a way to proportionalize fixed costs in a MILP?
Boss wants me to ignore a software API license
How to randomly split data into three equal sizes?
Randomly take equal number of elements from two groups — create two sub-dataframes from one dataframe with equal number of elementsHow to randomly select an item from a list?How to join (merge) data frames (inner, outer, left, right)How do I pick randomly from an array?Split Data into groups of equal meansdata.table vs dplyr: can one do something well the other can't or does poorly?Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in RRandomly assign people into different size groups and categoryR: Splitting data frame into equal size groupsHow to Split a df into unique groups?How to Randomly Assign to Groups of Different Sizes
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3
becomes Project1
, Project2
and Project3
.
I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n()
and sample_frac()
, but unfortunately I can't solve this issue myself :/
I have made an example of my dataset looking like this:
ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)
And the output should be randomly split in three equal group of nrow=3186
and then assigned to the values
ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186
r random group-by dplyr divide
add a comment |
I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3
becomes Project1
, Project2
and Project3
.
I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n()
and sample_frac()
, but unfortunately I can't solve this issue myself :/
I have made an example of my dataset looking like this:
ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)
And the output should be randomly split in three equal group of nrow=3186
and then assigned to the values
ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186
r random group-by dplyr divide
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
Doesc("Project1", "Project2", "Project3")
instead ofc("Project1_Project2_Project3")
give you what you want?
– jay.sf
Mar 27 at 11:18
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31
add a comment |
I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3
becomes Project1
, Project2
and Project3
.
I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n()
and sample_frac()
, but unfortunately I can't solve this issue myself :/
I have made an example of my dataset looking like this:
ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)
And the output should be randomly split in three equal group of nrow=3186
and then assigned to the values
ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186
r random group-by dplyr divide
I have a dataset with 9558 rows from three different projects. I want to randomly split this dataset in three equal groups and assign a unique ID for each group, so that Project1_Project_2_Project3
becomes Project1
, Project2
and Project3
.
I have tried many things, and googled codes from people with similar problem as I have. I have used sample_n()
and sample_frac()
, but unfortunately I can't solve this issue myself :/
I have made an example of my dataset looking like this:
ProjectName <- c("Project1_Project2_Project3")
data <- data.frame(replicate(10,sample(0:1,9558,rep=TRUE)))
data <- data.frame(ProjectName, data)
And the output should be randomly split in three equal group of nrow=3186
and then assigned to the values
ProjectName Count of rows
Project1 3186
Project2 3186
Project3 3186
r random group-by dplyr divide
r random group-by dplyr divide
edited Mar 27 at 11:43
kath
5,71411 silver badges27 bronze badges
5,71411 silver badges27 bronze badges
asked Mar 27 at 11:11
Rose Nonglak Seesan JensenRose Nonglak Seesan Jensen
256 bronze badges
256 bronze badges
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
Doesc("Project1", "Project2", "Project3")
instead ofc("Project1_Project2_Project3")
give you what you want?
– jay.sf
Mar 27 at 11:18
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31
add a comment |
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
Doesc("Project1", "Project2", "Project3")
instead ofc("Project1_Project2_Project3")
give you what you want?
– jay.sf
Mar 27 at 11:18
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
Does
c("Project1", "Project2", "Project3")
instead of c("Project1_Project2_Project3")
give you what you want?– jay.sf
Mar 27 at 11:18
Does
c("Project1", "Project2", "Project3")
instead of c("Project1_Project2_Project3")
give you what you want?– jay.sf
Mar 27 at 11:18
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31
add a comment |
4 Answers
4
active
oldest
votes
IMO it should be sufficient to assign just random project names.
dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)),
labels=paste0("Project", 1:3)))
Result
head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3
table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186
Data
set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
add a comment |
Add an id
to data
:
data$id <- 1:nrow(data)
Take the first sample:
project1 <- dplyr::sample_frac(data, 0.33333)
Remove the used rows from data and save into project2
:
project2 <- data[!(data$id %in% project1$id), ]
Sample half of the remainder:
project3 <- dplyr::sample_frac(project2, 0.5)
Finally remove those in the project3
sample from project2
:
project2 <- project2[!(project2$id %in% project3$id), ]
Check all id
s are unique:
# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)
And double-check the data frames have the right number of cases:
nrow(project1)
nrow(project2)
nrow(project3)
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.
sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) )
data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )
project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I like the solution in this comment to a Github gist.
You could generate the indices as suggested:
folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))
Then get a list of 3 equal size data frames using:
datalist <- lapply(folds, function(x) data[x, ])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55375807%2fhow-to-randomly-split-data-into-three-equal-sizes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
IMO it should be sufficient to assign just random project names.
dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)),
labels=paste0("Project", 1:3)))
Result
head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3
table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186
Data
set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
add a comment |
IMO it should be sufficient to assign just random project names.
dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)),
labels=paste0("Project", 1:3)))
Result
head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3
table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186
Data
set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
add a comment |
IMO it should be sufficient to assign just random project names.
dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)),
labels=paste0("Project", 1:3)))
Result
head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3
table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186
Data
set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))
IMO it should be sufficient to assign just random project names.
dat$ProjectName <- sample(factor(rep(1:3, length.out=nrow(dat)),
labels=paste0("Project", 1:3)))
Result
head(dat)
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ProjectName
# 1 1 1 0 1 1 1 1 0 1 0 Project1
# 2 1 1 1 1 1 1 0 0 1 0 Project1
# 3 0 0 1 1 0 0 0 1 1 1 Project1
# 4 1 1 1 0 1 0 1 1 0 1 Project3
# 5 1 0 0 1 1 1 1 0 0 1 Project1
# 6 1 0 0 0 0 1 0 1 1 1 Project3
table(dat$ProjectName)
# Project1 Project2 Project3
# 3186 3186 3186
Data
set.seed(42)
dat <- data.frame(replicate(10, sample(0:1, 9558, rep=TRUE)))
edited Mar 29 at 5:45
answered Mar 27 at 11:50
jay.sfjay.sf
10.6k3 gold badges21 silver badges45 bronze badges
10.6k3 gold badges21 silver badges45 bronze badges
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
add a comment |
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
1
1
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
Thank you Jay :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
You're very welcome @RoseNonglakSeesanJensen.
– jay.sf
Mar 27 at 12:30
add a comment |
Add an id
to data
:
data$id <- 1:nrow(data)
Take the first sample:
project1 <- dplyr::sample_frac(data, 0.33333)
Remove the used rows from data and save into project2
:
project2 <- data[!(data$id %in% project1$id), ]
Sample half of the remainder:
project3 <- dplyr::sample_frac(project2, 0.5)
Finally remove those in the project3
sample from project2
:
project2 <- project2[!(project2$id %in% project3$id), ]
Check all id
s are unique:
# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)
And double-check the data frames have the right number of cases:
nrow(project1)
nrow(project2)
nrow(project3)
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
Add an id
to data
:
data$id <- 1:nrow(data)
Take the first sample:
project1 <- dplyr::sample_frac(data, 0.33333)
Remove the used rows from data and save into project2
:
project2 <- data[!(data$id %in% project1$id), ]
Sample half of the remainder:
project3 <- dplyr::sample_frac(project2, 0.5)
Finally remove those in the project3
sample from project2
:
project2 <- project2[!(project2$id %in% project3$id), ]
Check all id
s are unique:
# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)
And double-check the data frames have the right number of cases:
nrow(project1)
nrow(project2)
nrow(project3)
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
Add an id
to data
:
data$id <- 1:nrow(data)
Take the first sample:
project1 <- dplyr::sample_frac(data, 0.33333)
Remove the used rows from data and save into project2
:
project2 <- data[!(data$id %in% project1$id), ]
Sample half of the remainder:
project3 <- dplyr::sample_frac(project2, 0.5)
Finally remove those in the project3
sample from project2
:
project2 <- project2[!(project2$id %in% project3$id), ]
Check all id
s are unique:
# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)
And double-check the data frames have the right number of cases:
nrow(project1)
nrow(project2)
nrow(project3)
Add an id
to data
:
data$id <- 1:nrow(data)
Take the first sample:
project1 <- dplyr::sample_frac(data, 0.33333)
Remove the used rows from data and save into project2
:
project2 <- data[!(data$id %in% project1$id), ]
Sample half of the remainder:
project3 <- dplyr::sample_frac(project2, 0.5)
Finally remove those in the project3
sample from project2
:
project2 <- project2[!(project2$id %in% project3$id), ]
Check all id
s are unique:
# should all be FALSE
any(project1$id %in% project2$id)
any(project1$id %in% project3$id)
any(project2$id %in% project3$id)
And double-check the data frames have the right number of cases:
nrow(project1)
nrow(project2)
nrow(project3)
answered Mar 27 at 11:21
PhilPhil
3,1981 gold badge15 silver badges29 bronze badges
3,1981 gold badge15 silver badges29 bronze badges
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
Thank you so much Phil :)
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.
sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) )
data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )
project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.
sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) )
data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )
project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.
sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) )
data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )
project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]
I had this same problem once. This is how I did it. If you just use sample, the groups are uneven, by sampling off a vector where the groups are even worked for me.
sampleframe <- rep(1:3, ceiling( nrow( data)/3 ) )
data$grp <- 0
data[ , "grp" ] <- sample( sampleframe , size=nrow( data) , replace=FALSE )
project1 <- data[data$grp %in% 1 ,]
project2 <- data[data$grp %in% 2 ,]
project3 <- data[data$grp %in% 3 ,]
answered Mar 27 at 11:26
MatthewRMatthewR
1,1611 gold badge13 silver badges22 bronze badges
1,1611 gold badge13 silver badges22 bronze badges
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
Thank you so much for this! It works :D
– Rose Nonglak Seesan Jensen
Mar 27 at 12:26
add a comment |
I like the solution in this comment to a Github gist.
You could generate the indices as suggested:
folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))
Then get a list of 3 equal size data frames using:
datalist <- lapply(folds, function(x) data[x, ])
add a comment |
I like the solution in this comment to a Github gist.
You could generate the indices as suggested:
folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))
Then get a list of 3 equal size data frames using:
datalist <- lapply(folds, function(x) data[x, ])
add a comment |
I like the solution in this comment to a Github gist.
You could generate the indices as suggested:
folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))
Then get a list of 3 equal size data frames using:
datalist <- lapply(folds, function(x) data[x, ])
I like the solution in this comment to a Github gist.
You could generate the indices as suggested:
folds <- split(sample(nrow(data), nrow(data), replace = FALSE), as.factor(1:3))
Then get a list of 3 equal size data frames using:
datalist <- lapply(folds, function(x) data[x, ])
answered Mar 27 at 11:41
neilfwsneilfws
20.6k5 gold badges38 silver badges49 bronze badges
20.6k5 gold badges38 silver badges49 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55375807%2fhow-to-randomly-split-data-into-three-equal-sizes%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
when you say split this means that you do not want repeats in the groups right? as in data in 15 is only in 1 set
– Hojo.Timberwolf
Mar 27 at 11:15
Does
c("Project1", "Project2", "Project3")
instead ofc("Project1_Project2_Project3")
give you what you want?– jay.sf
Mar 27 at 11:18
@Hojo.Timberwolf Yes, i dont want repeats in the groups. What do you mean in 15 is only 1 set?
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
@jay.sf The real dataset that I have contains data from three different projects and there is only one unique ID for this and it is structured the same way as the one I made. But I would like to split it randomly into three equal groups and each group should have their own name: Project1, Project2 and Project3 :)
– Rose Nonglak Seesan Jensen
Mar 27 at 11:23
This question needs to be simply modified and asked in a better way to be useful for others too!
– Majid
Mar 27 at 11:31