How can I tidy student enrollment data on a per semester basis?How to make a great R reproducible exampleHow to join (merge) data frames (inner, outer, left, right)How can we make xkcd style graphs?How can I view the source code for a function?Advanced tables in R (edited for clarity)trouble debugging my ifelse statementSimple Scatter plot: variable issues on X axisHow do I extract certain words in my document into a dataframe in R?converting to tidy data format in RR dplyr summarise date gapsDeduplication of data using multiple columns in R
Will removing shelving screws from studs damage the studs?
Which meaning of "must" does the Slow spell use?
Alternatives to Network Backup
Commercial company wants me to list all prior "inventions", give up everything not listed
Can an object tethered to a spaceship be pulled out of event horizon?
Many many thanks
Fantasy Macro Economics: What would Merfolk Trade?
How do we improve collaboration with problematic tester team?
How could a self contained organic body propel itself in space
Why did Lucius make a deal out of Buckbeak hurting Draco but not about Draco being turned into a ferret?
How to force GCC to assume that a floating-point expression is non-negative?
Why did James Cameron decide to give Alita big eyes?
How do solar inverter systems easily add AC power sources together?
Is this password scheme legit?
Force SQL Server to use fragmented indexes?
Defending Castle from Zombies
Is there any problem with a full installation on a USB drive?
Can I use coax outlets for cable modem?
What stops you from using fixed income in developing countries?
Why is there not a willingness from the world to step in between Pakistan and India?
Stolen MacBook should I worry about my data?
Do sharpies or markers damage soft rock climbing gear?
Why is explainability not one of the criteria for publication?
Is there a word or phrase that means "use other people's wifi or Internet service without consent"?
How can I tidy student enrollment data on a per semester basis?
How to make a great R reproducible exampleHow to join (merge) data frames (inner, outer, left, right)How can we make xkcd style graphs?How can I view the source code for a function?Advanced tables in R (edited for clarity)trouble debugging my ifelse statementSimple Scatter plot: variable issues on X axisHow do I extract certain words in my document into a dataframe in R?converting to tidy data format in RR dplyr summarise date gapsDeduplication of data using multiple columns in R
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a dataset that currently lists student information on a term basis (i.e., 201610, 201620, 201630, 201640, 201710, etc.) with suffix 10 = fall, 20 = winter, 30 = spring, and 40 = summer. Not all terms are necessarily listed for every student.
What I would like to do is identify the first term in which a student was enrolled, presumably the fall, as T1, and subsequent terms as T2, T3, etc. Since some students may take a winter summer term, I would like to identify those as T1_Winter, T2_Summer, etc.
I've been able to isolate the individual terms for which a student has enrolled, and have been able to identify the first, intermediate, and last terms as 1, 2, 3, etc. However, I can't manage to wrap my head around how to identify fall and spring as 1, 2, 3, 4, and the intermediary terms, winter and summer, and 1.5, 2.5, 3.5, 4.5, etc.
# Create the sample dataset
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010))
)
# Isolate student IDs and terms
stdTerm <- subset(data, select = c("ID","RegTerm"))
# Sort according to ID and RegTerm
stdTerm <- stdTerm[
with(stdTerm, order(ID, RegTerm)),
]
# Remove duplicate combinations of ID and term
y <- stdTerm[!duplicated(stdTerm[c(1,2)]),]
# Create an index to identify the term number
# for which a student enrolled
library(dplyr)
z <- y %>%
arrange(ID, RegTerm) %>%
group_by(ID) %>%
mutate(StdTermIndex = seq(n()))
Right now, it's identifying the progression of all terms for a student as 1, 2, 3, etc., but not winter and summer as intermediary terms. That is, if a student enrolled in fall and winter, winter will appear as 2 and spring will appear as 3.
In the sample data provided, I would like Student ID 1 to reflect 201810 as 1, 201820 as 1.5, and 201830 as 2, etc. Any suggestions or previous code I could reference to wrap my head around how I can code the intermediary semesters?
r dplyr data-analysis
add a comment |
I have a dataset that currently lists student information on a term basis (i.e., 201610, 201620, 201630, 201640, 201710, etc.) with suffix 10 = fall, 20 = winter, 30 = spring, and 40 = summer. Not all terms are necessarily listed for every student.
What I would like to do is identify the first term in which a student was enrolled, presumably the fall, as T1, and subsequent terms as T2, T3, etc. Since some students may take a winter summer term, I would like to identify those as T1_Winter, T2_Summer, etc.
I've been able to isolate the individual terms for which a student has enrolled, and have been able to identify the first, intermediate, and last terms as 1, 2, 3, etc. However, I can't manage to wrap my head around how to identify fall and spring as 1, 2, 3, 4, and the intermediary terms, winter and summer, and 1.5, 2.5, 3.5, 4.5, etc.
# Create the sample dataset
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010))
)
# Isolate student IDs and terms
stdTerm <- subset(data, select = c("ID","RegTerm"))
# Sort according to ID and RegTerm
stdTerm <- stdTerm[
with(stdTerm, order(ID, RegTerm)),
]
# Remove duplicate combinations of ID and term
y <- stdTerm[!duplicated(stdTerm[c(1,2)]),]
# Create an index to identify the term number
# for which a student enrolled
library(dplyr)
z <- y %>%
arrange(ID, RegTerm) %>%
group_by(ID) %>%
mutate(StdTermIndex = seq(n()))
Right now, it's identifying the progression of all terms for a student as 1, 2, 3, etc., but not winter and summer as intermediary terms. That is, if a student enrolled in fall and winter, winter will appear as 2 and spring will appear as 3.
In the sample data provided, I would like Student ID 1 to reflect 201810 as 1, 201820 as 1.5, and 201830 as 2, etc. Any suggestions or previous code I could reference to wrap my head around how I can code the intermediary semesters?
r dplyr data-analysis
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38
add a comment |
I have a dataset that currently lists student information on a term basis (i.e., 201610, 201620, 201630, 201640, 201710, etc.) with suffix 10 = fall, 20 = winter, 30 = spring, and 40 = summer. Not all terms are necessarily listed for every student.
What I would like to do is identify the first term in which a student was enrolled, presumably the fall, as T1, and subsequent terms as T2, T3, etc. Since some students may take a winter summer term, I would like to identify those as T1_Winter, T2_Summer, etc.
I've been able to isolate the individual terms for which a student has enrolled, and have been able to identify the first, intermediate, and last terms as 1, 2, 3, etc. However, I can't manage to wrap my head around how to identify fall and spring as 1, 2, 3, 4, and the intermediary terms, winter and summer, and 1.5, 2.5, 3.5, 4.5, etc.
# Create the sample dataset
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010))
)
# Isolate student IDs and terms
stdTerm <- subset(data, select = c("ID","RegTerm"))
# Sort according to ID and RegTerm
stdTerm <- stdTerm[
with(stdTerm, order(ID, RegTerm)),
]
# Remove duplicate combinations of ID and term
y <- stdTerm[!duplicated(stdTerm[c(1,2)]),]
# Create an index to identify the term number
# for which a student enrolled
library(dplyr)
z <- y %>%
arrange(ID, RegTerm) %>%
group_by(ID) %>%
mutate(StdTermIndex = seq(n()))
Right now, it's identifying the progression of all terms for a student as 1, 2, 3, etc., but not winter and summer as intermediary terms. That is, if a student enrolled in fall and winter, winter will appear as 2 and spring will appear as 3.
In the sample data provided, I would like Student ID 1 to reflect 201810 as 1, 201820 as 1.5, and 201830 as 2, etc. Any suggestions or previous code I could reference to wrap my head around how I can code the intermediary semesters?
r dplyr data-analysis
I have a dataset that currently lists student information on a term basis (i.e., 201610, 201620, 201630, 201640, 201710, etc.) with suffix 10 = fall, 20 = winter, 30 = spring, and 40 = summer. Not all terms are necessarily listed for every student.
What I would like to do is identify the first term in which a student was enrolled, presumably the fall, as T1, and subsequent terms as T2, T3, etc. Since some students may take a winter summer term, I would like to identify those as T1_Winter, T2_Summer, etc.
I've been able to isolate the individual terms for which a student has enrolled, and have been able to identify the first, intermediate, and last terms as 1, 2, 3, etc. However, I can't manage to wrap my head around how to identify fall and spring as 1, 2, 3, 4, and the intermediary terms, winter and summer, and 1.5, 2.5, 3.5, 4.5, etc.
# Create the sample dataset
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010))
)
# Isolate student IDs and terms
stdTerm <- subset(data, select = c("ID","RegTerm"))
# Sort according to ID and RegTerm
stdTerm <- stdTerm[
with(stdTerm, order(ID, RegTerm)),
]
# Remove duplicate combinations of ID and term
y <- stdTerm[!duplicated(stdTerm[c(1,2)]),]
# Create an index to identify the term number
# for which a student enrolled
library(dplyr)
z <- y %>%
arrange(ID, RegTerm) %>%
group_by(ID) %>%
mutate(StdTermIndex = seq(n()))
Right now, it's identifying the progression of all terms for a student as 1, 2, 3, etc., but not winter and summer as intermediary terms. That is, if a student enrolled in fall and winter, winter will appear as 2 and spring will appear as 3.
In the sample data provided, I would like Student ID 1 to reflect 201810 as 1, 201820 as 1.5, and 201830 as 2, etc. Any suggestions or previous code I could reference to wrap my head around how I can code the intermediary semesters?
r dplyr data-analysis
r dplyr data-analysis
edited Mar 27 at 20:35
Anna K
asked Mar 27 at 19:52
Anna KAnna K
83 bronze badges
83 bronze badges
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38
add a comment |
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38
add a comment |
2 Answers
2
active
oldest
votes
So, to do it in your sample, I created a handle variable that tells me whether the RegTerm
is even or odd.
The reason is simple, odd RegTerm
means it is a regular term, whereas even ones will be either winter or summer terms.
library(dplyr)
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010)
)
dat <- data %>%
mutate(term = str_extract(RegTerm, '(?<=\d4)\d1(?=0)'),
term = as.numeric(term) %% 2) %>%
group_by(ID) %>%
mutate(numTerm = cumsum(term),
numTerm = ifelse(term == 0, numTerm + 0.5, numTerm))
The first mutate extracts the 5th digit in the RegTerm
column and get the rest of its division by 2. If it equals 1, it means it is a regular term, otherwise it will be either summer or winter.
Next I take the cumulative sum of this variable, which will give you in which RegTerm
the student is. Then, for every term == 0
I add to numTerm
0.5, to account for the winter and summer terms.
# A tibble: 7 x 4
# Groups: ID [2]
ID RegTerm term numTerm
<dbl> <dbl> <dbl> <dbl>
1 1 201810 1 1
2 1 201820 0 1.5
3 1 201830 1 2
4 2 201910 1 1
5 2 201930 1 2
6 2 201940 0 2.5
7 2 202010 1 3
This way, if there is a student starting in a winter term, numTerm
will be assigned a 0.5
value, having numTerm = 1
only when he reaches a regular term (term == 1)
add a comment |
I think a good way to do this would be to separate your RegTerm column into year
and suffix
and then apply some condition formula once you have the values split up.
The below code does that, we just have to then apply it to the whole column and do some rejigging.
paste(strsplit(as.character(201810), "")[[1]][1:4], collapse = ""))
# "2018"
paste(strsplit(as.character(201810), "")[[1]][5:6], collapse = ""))
# "10"
So to do it on the data frame you want to use something like lapply
and then unlist the result and add a new column. After that you can change the values to numeric and then use some conditional statements in a mutate function to set the intermediary values etc.
z$year <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][1:4], collapse = "")))
z$suf <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][5:6], collapse = "")))
It looks a bit ugly but all it is doing is separating RegTerm
then selecting the first 4 or last 2 characters for year and suf respectively then collapsing (using collapse = ""
in paste
) them into a single string. We lapply
this to the whole column then unlist it to make vector.
I would recommend understanding the first two lines of code in this answer and then it will be made obvious.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55385419%2fhow-can-i-tidy-student-enrollment-data-on-a-per-semester-basis%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
So, to do it in your sample, I created a handle variable that tells me whether the RegTerm
is even or odd.
The reason is simple, odd RegTerm
means it is a regular term, whereas even ones will be either winter or summer terms.
library(dplyr)
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010)
)
dat <- data %>%
mutate(term = str_extract(RegTerm, '(?<=\d4)\d1(?=0)'),
term = as.numeric(term) %% 2) %>%
group_by(ID) %>%
mutate(numTerm = cumsum(term),
numTerm = ifelse(term == 0, numTerm + 0.5, numTerm))
The first mutate extracts the 5th digit in the RegTerm
column and get the rest of its division by 2. If it equals 1, it means it is a regular term, otherwise it will be either summer or winter.
Next I take the cumulative sum of this variable, which will give you in which RegTerm
the student is. Then, for every term == 0
I add to numTerm
0.5, to account for the winter and summer terms.
# A tibble: 7 x 4
# Groups: ID [2]
ID RegTerm term numTerm
<dbl> <dbl> <dbl> <dbl>
1 1 201810 1 1
2 1 201820 0 1.5
3 1 201830 1 2
4 2 201910 1 1
5 2 201930 1 2
6 2 201940 0 2.5
7 2 202010 1 3
This way, if there is a student starting in a winter term, numTerm
will be assigned a 0.5
value, having numTerm = 1
only when he reaches a regular term (term == 1)
add a comment |
So, to do it in your sample, I created a handle variable that tells me whether the RegTerm
is even or odd.
The reason is simple, odd RegTerm
means it is a regular term, whereas even ones will be either winter or summer terms.
library(dplyr)
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010)
)
dat <- data %>%
mutate(term = str_extract(RegTerm, '(?<=\d4)\d1(?=0)'),
term = as.numeric(term) %% 2) %>%
group_by(ID) %>%
mutate(numTerm = cumsum(term),
numTerm = ifelse(term == 0, numTerm + 0.5, numTerm))
The first mutate extracts the 5th digit in the RegTerm
column and get the rest of its division by 2. If it equals 1, it means it is a regular term, otherwise it will be either summer or winter.
Next I take the cumulative sum of this variable, which will give you in which RegTerm
the student is. Then, for every term == 0
I add to numTerm
0.5, to account for the winter and summer terms.
# A tibble: 7 x 4
# Groups: ID [2]
ID RegTerm term numTerm
<dbl> <dbl> <dbl> <dbl>
1 1 201810 1 1
2 1 201820 0 1.5
3 1 201830 1 2
4 2 201910 1 1
5 2 201930 1 2
6 2 201940 0 2.5
7 2 202010 1 3
This way, if there is a student starting in a winter term, numTerm
will be assigned a 0.5
value, having numTerm = 1
only when he reaches a regular term (term == 1)
add a comment |
So, to do it in your sample, I created a handle variable that tells me whether the RegTerm
is even or odd.
The reason is simple, odd RegTerm
means it is a regular term, whereas even ones will be either winter or summer terms.
library(dplyr)
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010)
)
dat <- data %>%
mutate(term = str_extract(RegTerm, '(?<=\d4)\d1(?=0)'),
term = as.numeric(term) %% 2) %>%
group_by(ID) %>%
mutate(numTerm = cumsum(term),
numTerm = ifelse(term == 0, numTerm + 0.5, numTerm))
The first mutate extracts the 5th digit in the RegTerm
column and get the rest of its division by 2. If it equals 1, it means it is a regular term, otherwise it will be either summer or winter.
Next I take the cumulative sum of this variable, which will give you in which RegTerm
the student is. Then, for every term == 0
I add to numTerm
0.5, to account for the winter and summer terms.
# A tibble: 7 x 4
# Groups: ID [2]
ID RegTerm term numTerm
<dbl> <dbl> <dbl> <dbl>
1 1 201810 1 1
2 1 201820 0 1.5
3 1 201830 1 2
4 2 201910 1 1
5 2 201930 1 2
6 2 201940 0 2.5
7 2 202010 1 3
This way, if there is a student starting in a winter term, numTerm
will be assigned a 0.5
value, having numTerm = 1
only when he reaches a regular term (term == 1)
So, to do it in your sample, I created a handle variable that tells me whether the RegTerm
is even or odd.
The reason is simple, odd RegTerm
means it is a regular term, whereas even ones will be either winter or summer terms.
library(dplyr)
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010)
)
dat <- data %>%
mutate(term = str_extract(RegTerm, '(?<=\d4)\d1(?=0)'),
term = as.numeric(term) %% 2) %>%
group_by(ID) %>%
mutate(numTerm = cumsum(term),
numTerm = ifelse(term == 0, numTerm + 0.5, numTerm))
The first mutate extracts the 5th digit in the RegTerm
column and get the rest of its division by 2. If it equals 1, it means it is a regular term, otherwise it will be either summer or winter.
Next I take the cumulative sum of this variable, which will give you in which RegTerm
the student is. Then, for every term == 0
I add to numTerm
0.5, to account for the winter and summer terms.
# A tibble: 7 x 4
# Groups: ID [2]
ID RegTerm term numTerm
<dbl> <dbl> <dbl> <dbl>
1 1 201810 1 1
2 1 201820 0 1.5
3 1 201830 1 2
4 2 201910 1 1
5 2 201930 1 2
6 2 201940 0 2.5
7 2 202010 1 3
This way, if there is a student starting in a winter term, numTerm
will be assigned a 0.5
value, having numTerm = 1
only when he reaches a regular term (term == 1)
edited Mar 28 at 12:26
answered Mar 28 at 11:56
Felipe AlvarengaFelipe Alvarenga
1,6338 silver badges24 bronze badges
1,6338 silver badges24 bronze badges
add a comment |
add a comment |
I think a good way to do this would be to separate your RegTerm column into year
and suffix
and then apply some condition formula once you have the values split up.
The below code does that, we just have to then apply it to the whole column and do some rejigging.
paste(strsplit(as.character(201810), "")[[1]][1:4], collapse = ""))
# "2018"
paste(strsplit(as.character(201810), "")[[1]][5:6], collapse = ""))
# "10"
So to do it on the data frame you want to use something like lapply
and then unlist the result and add a new column. After that you can change the values to numeric and then use some conditional statements in a mutate function to set the intermediary values etc.
z$year <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][1:4], collapse = "")))
z$suf <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][5:6], collapse = "")))
It looks a bit ugly but all it is doing is separating RegTerm
then selecting the first 4 or last 2 characters for year and suf respectively then collapsing (using collapse = ""
in paste
) them into a single string. We lapply
this to the whole column then unlist it to make vector.
I would recommend understanding the first two lines of code in this answer and then it will be made obvious.
add a comment |
I think a good way to do this would be to separate your RegTerm column into year
and suffix
and then apply some condition formula once you have the values split up.
The below code does that, we just have to then apply it to the whole column and do some rejigging.
paste(strsplit(as.character(201810), "")[[1]][1:4], collapse = ""))
# "2018"
paste(strsplit(as.character(201810), "")[[1]][5:6], collapse = ""))
# "10"
So to do it on the data frame you want to use something like lapply
and then unlist the result and add a new column. After that you can change the values to numeric and then use some conditional statements in a mutate function to set the intermediary values etc.
z$year <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][1:4], collapse = "")))
z$suf <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][5:6], collapse = "")))
It looks a bit ugly but all it is doing is separating RegTerm
then selecting the first 4 or last 2 characters for year and suf respectively then collapsing (using collapse = ""
in paste
) them into a single string. We lapply
this to the whole column then unlist it to make vector.
I would recommend understanding the first two lines of code in this answer and then it will be made obvious.
add a comment |
I think a good way to do this would be to separate your RegTerm column into year
and suffix
and then apply some condition formula once you have the values split up.
The below code does that, we just have to then apply it to the whole column and do some rejigging.
paste(strsplit(as.character(201810), "")[[1]][1:4], collapse = ""))
# "2018"
paste(strsplit(as.character(201810), "")[[1]][5:6], collapse = ""))
# "10"
So to do it on the data frame you want to use something like lapply
and then unlist the result and add a new column. After that you can change the values to numeric and then use some conditional statements in a mutate function to set the intermediary values etc.
z$year <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][1:4], collapse = "")))
z$suf <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][5:6], collapse = "")))
It looks a bit ugly but all it is doing is separating RegTerm
then selecting the first 4 or last 2 characters for year and suf respectively then collapsing (using collapse = ""
in paste
) them into a single string. We lapply
this to the whole column then unlist it to make vector.
I would recommend understanding the first two lines of code in this answer and then it will be made obvious.
I think a good way to do this would be to separate your RegTerm column into year
and suffix
and then apply some condition formula once you have the values split up.
The below code does that, we just have to then apply it to the whole column and do some rejigging.
paste(strsplit(as.character(201810), "")[[1]][1:4], collapse = ""))
# "2018"
paste(strsplit(as.character(201810), "")[[1]][5:6], collapse = ""))
# "10"
So to do it on the data frame you want to use something like lapply
and then unlist the result and add a new column. After that you can change the values to numeric and then use some conditional statements in a mutate function to set the intermediary values etc.
z$year <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][1:4], collapse = "")))
z$suf <- unlist(lapply(z$RegTerm, function(x) paste(strsplit(as.character(x), "")[[1]][5:6], collapse = "")))
It looks a bit ugly but all it is doing is separating RegTerm
then selecting the first 4 or last 2 characters for year and suf respectively then collapsing (using collapse = ""
in paste
) them into a single string. We lapply
this to the whole column then unlist it to make vector.
I would recommend understanding the first two lines of code in this answer and then it will be made obvious.
answered Mar 27 at 23:08
CrooteCroote
7523 silver badges14 bronze badges
7523 silver badges14 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55385419%2fhow-can-i-tidy-student-enrollment-data-on-a-per-semester-basis%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
give us a sample data so we can better understand your problem
– Felipe Alvarenga
Mar 27 at 20:29
Also, check this out stackoverflow.com/questions/5963269/…
– Felipe Alvarenga
Mar 27 at 20:30
Thanks, @FelipeAlvarenga! My apologies as it's my first time posting here. I've included a sample dataset in my question and hope it clarifies the problem.
– Anna K
Mar 27 at 20:38