Subtract rows varying one column but keeping others fixed Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasAdd columns to a list of data.framesR rollapply bottom to top?subtracting controls from multiple datasetsRemoving rows based where data isn't sequential in R, dplyrR efficient reformatting and sequencing between predictor values stored in columnsHow to select value in different rows for each different column in R?Normalize multiple values using values of one factor in RFinding counts of unique values in column for each unique value in other column
Simple Line in LaTeX Help!
Did any compiler fully use 80-bit floating point?
Why shouldn't this prove the Prime Number Theorem?
Found this skink in my tomato plant bucket. Is he trapped? Or could he leave if he wanted?
what is the log of the PDF for a Normal Distribution?
Can humans save crash-landed aliens?
What does 丫 mean? 丫是什么意思?
How to change the tick of the color bar legend to black
Did pre-Columbian Americans know the spherical shape of the Earth?
Co-worker has annoying ringtone
Can an iPhone 7 be made to function as a NFC Tag?
How to write capital alpha?
How often does castling occur in grandmaster games?
Was Kant an Intuitionist about mathematical objects?
Monty Hall Problem-Probability Paradox
Nose gear failure in single prop aircraft: belly landing or nose-gear up landing?
Why datecode is SO IMPORTANT to chip manufacturers?
malloc in main() or malloc in another function: allocating memory for a struct and its members
How does light 'choose' between wave and particle behaviour?
How much damage would a cupful of neutron star matter do to the Earth?
GDP with Intermediate Production
Why is the change of basis formula counter-intuitive? [See details]
What would you call this weird metallic apparatus that allows you to lift people?
Why complex landing gears are used instead of simple,reliability and light weight muscle wire or shape memory alloys?
Subtract rows varying one column but keeping others fixed
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!How to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasAdd columns to a list of data.framesR rollapply bottom to top?subtracting controls from multiple datasetsRemoving rows based where data isn't sequential in R, dplyrR efficient reformatting and sequencing between predictor values stored in columnsHow to select value in different rows for each different column in R?Normalize multiple values using values of one factor in RFinding counts of unique values in column for each unique value in other column
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled.
Dummy data frame:
df <- data.frame("Treatment" = c("Control","Treat1", "Treat2"),
"Block" = rep(1:3, each=3), "Year" = rep(2011:2013, each=3),
"Value" = c(6,12,4,3,9,5,6,3,1));df
Treatment Block Year Value
1 Control 1 2011 6
2 Treat1 1 2011 12
3 Treat2 1 2011 4
4 Control 2 2012 3
5 Treat1 2 2012 9
6 Treat2 2 2012 5
7 Control 3 2013 6
8 Treat1 3 2013 3
9 Treat2 3 2013 1
Desired output:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
Any suggestion, preferably using dplyr
?
I have found similar questions but none addressing this specific issue.
r dataframe dplyr
add a comment |
I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled.
Dummy data frame:
df <- data.frame("Treatment" = c("Control","Treat1", "Treat2"),
"Block" = rep(1:3, each=3), "Year" = rep(2011:2013, each=3),
"Value" = c(6,12,4,3,9,5,6,3,1));df
Treatment Block Year Value
1 Control 1 2011 6
2 Treat1 1 2011 12
3 Treat2 1 2011 4
4 Control 2 2012 3
5 Treat1 2 2012 9
6 Treat2 2 2012 5
7 Control 3 2013 6
8 Treat1 3 2013 3
9 Treat2 3 2013 1
Desired output:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
Any suggestion, preferably using dplyr
?
I have found similar questions but none addressing this specific issue.
r dataframe dplyr
add a comment |
I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled.
Dummy data frame:
df <- data.frame("Treatment" = c("Control","Treat1", "Treat2"),
"Block" = rep(1:3, each=3), "Year" = rep(2011:2013, each=3),
"Value" = c(6,12,4,3,9,5,6,3,1));df
Treatment Block Year Value
1 Control 1 2011 6
2 Treat1 1 2011 12
3 Treat2 1 2011 4
4 Control 2 2012 3
5 Treat1 2 2012 9
6 Treat2 2 2012 5
7 Control 3 2013 6
8 Treat1 3 2013 3
9 Treat2 3 2013 1
Desired output:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
Any suggestion, preferably using dplyr
?
I have found similar questions but none addressing this specific issue.
r dataframe dplyr
I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled.
Dummy data frame:
df <- data.frame("Treatment" = c("Control","Treat1", "Treat2"),
"Block" = rep(1:3, each=3), "Year" = rep(2011:2013, each=3),
"Value" = c(6,12,4,3,9,5,6,3,1));df
Treatment Block Year Value
1 Control 1 2011 6
2 Treat1 1 2011 12
3 Treat2 1 2011 4
4 Control 2 2012 3
5 Treat1 2 2012 9
6 Treat2 2 2012 5
7 Control 3 2013 6
8 Treat1 3 2013 3
9 Treat2 3 2013 1
Desired output:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
Any suggestion, preferably using dplyr
?
I have found similar questions but none addressing this specific issue.
r dataframe dplyr
r dataframe dplyr
edited Mar 22 at 12:37
Ronak Shah
48.6k104370
48.6k104370
asked Mar 22 at 12:00
LucasLucas
83
83
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
We can use dplyr
, group_by
Block
and subtract Value
where Treatment == "Control"
from each Value
and remove the "Control" rows.
library(dplyr)
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value) %>%
filter(Treatment != "Control")
# Treatment Block Year Value
# <fct> <int> <int> <dbl>
#1 Treat1 1 2011 -6
#2 Treat2 1 2011 2
#3 Treat1 2 2012 -6
#4 Treat2 2 2012 -2
#5 Treat1 3 2013 3
#6 Treat2 3 2013 5
Not sure, if the values in Treatment
column in expected output (Control-Treat1
, Control-Treat2
) are shown only for demonstration purpose of the calculation or OP really wants that as output. In case if that is needed as output we can use
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value,
Treatment = paste0("Control-", Treatment)) %>%
filter(Treatment != "Control-Control")
# Treatment Block Year Value
# <chr> <int> <int> <dbl>
#1 Control-Treat1 1 2011 -6
#2 Control-Treat2 1 2011 2
#3 Control-Treat1 2 2012 -6
#4 Control-Treat2 2 2012 -2
#5 Control-Treat1 3 2013 3
#6 Control-Treat2 3 2013 5
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
add a comment |
A somehow different tidyverse
possibility could be:
df %>%
spread(Treatment, Value) %>%
gather(var, val, -c(Block, Year, Control)) %>%
mutate(Value = Control - val,
Treatment = paste("Control", var, sep = " - ")) %>%
select(Treatment, Block, Year, Value) %>%
arrange(Block)
Treatment Block Year Value
1 Control - Treat1 1 2011 -6
2 Control - Treat2 1 2011 2
3 Control - Treat1 2 2012 -6
4 Control - Treat2 2 2012 -2
5 Control - Treat1 3 2013 3
6 Control - Treat2 3 2013 5
add a comment |
This can be done with an SQL self join like this:
library(sqldf)
sqldf("select a.Treatment || '-' || b.Treatment as Treatment,
a.Block,
a.Year,
a.Value - b.Value as Value
from df a
join df b on a.block = b.block and
a.Treatment = 'Control' and
b.Treatment != 'Control'")
giving:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
add a comment |
Another dplyr
-tidyr
approach: You can remove unwanted columns with select
:
library(tidyr)
library(dplyr)
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
group_by(Block,Year,key) %>%
mutate(Val=Control-value)
# A tibble: 6 x 6
# Groups: Block, Year, key [6]
Block Year Control key value Val
<int> <int> <dbl> <chr> <dbl> <dbl>
1 1 2011 6 Treat1 12 -6
2 2 2012 3 Treat1 9 -6
3 3 2013 6 Treat1 3 3
4 1 2011 6 Treat2 4 2
5 2 2012 3 Treat2 5 -2
6 3 2013 6 Treat2 1 5
Just the exact output:
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
mutate(Treatment=paste0("Control-",key)) %>%
group_by(Block,Year,Treatment) %>%
mutate(Val=Control-value) %>%
select(Treatment,everything(),-value,-key)%>%
arrange(Year)
Result:
# A tibble: 6 x 5
# Groups: Block, Year, Treatment [6]
Treatment Block Year Control Val
<chr> <int> <int> <dbl> <dbl>
1 Control-Treat1 1 2011 6 -6
2 Control-Treat2 1 2011 6 2
3 Control-Treat1 2 2012 3 -6
4 Control-Treat2 2 2012 3 -2
5 Control-Treat1 3 2013 6 3
6 Control-Treat2 3 2013 6 5
add a comment |
Another tidyverse
solution. We can use filter
to separate "Control" and "Treatment" to different data frames, use left_join
to combine them by Block
and Year
, and then process the data frame.
library(tidyverse)
df2 <- df %>%
filter(!Treatment %in% "Control") %>%
left_join(df %>% filter(Treatment %in% "Control"),
.,
by = c("Block", "Year")) %>%
mutate(Value = Value.x - Value.y) %>%
unite(Treatment, Treatment.x, Treatment.y, sep = "-") %>%
select(names(df))
# Treatment Block Year Value
# 1 Control-Treat1 1 2011 -6
# 2 Control-Treat2 1 2011 2
# 3 Control-Treat1 2 2012 -6
# 4 Control-Treat2 2 2012 -2
# 5 Control-Treat1 3 2013 3
# 6 Control-Treat2 3 2013 5
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55299174%2fsubtract-rows-varying-one-column-but-keeping-others-fixed%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
We can use dplyr
, group_by
Block
and subtract Value
where Treatment == "Control"
from each Value
and remove the "Control" rows.
library(dplyr)
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value) %>%
filter(Treatment != "Control")
# Treatment Block Year Value
# <fct> <int> <int> <dbl>
#1 Treat1 1 2011 -6
#2 Treat2 1 2011 2
#3 Treat1 2 2012 -6
#4 Treat2 2 2012 -2
#5 Treat1 3 2013 3
#6 Treat2 3 2013 5
Not sure, if the values in Treatment
column in expected output (Control-Treat1
, Control-Treat2
) are shown only for demonstration purpose of the calculation or OP really wants that as output. In case if that is needed as output we can use
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value,
Treatment = paste0("Control-", Treatment)) %>%
filter(Treatment != "Control-Control")
# Treatment Block Year Value
# <chr> <int> <int> <dbl>
#1 Control-Treat1 1 2011 -6
#2 Control-Treat2 1 2011 2
#3 Control-Treat1 2 2012 -6
#4 Control-Treat2 2 2012 -2
#5 Control-Treat1 3 2013 3
#6 Control-Treat2 3 2013 5
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
add a comment |
We can use dplyr
, group_by
Block
and subtract Value
where Treatment == "Control"
from each Value
and remove the "Control" rows.
library(dplyr)
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value) %>%
filter(Treatment != "Control")
# Treatment Block Year Value
# <fct> <int> <int> <dbl>
#1 Treat1 1 2011 -6
#2 Treat2 1 2011 2
#3 Treat1 2 2012 -6
#4 Treat2 2 2012 -2
#5 Treat1 3 2013 3
#6 Treat2 3 2013 5
Not sure, if the values in Treatment
column in expected output (Control-Treat1
, Control-Treat2
) are shown only for demonstration purpose of the calculation or OP really wants that as output. In case if that is needed as output we can use
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value,
Treatment = paste0("Control-", Treatment)) %>%
filter(Treatment != "Control-Control")
# Treatment Block Year Value
# <chr> <int> <int> <dbl>
#1 Control-Treat1 1 2011 -6
#2 Control-Treat2 1 2011 2
#3 Control-Treat1 2 2012 -6
#4 Control-Treat2 2 2012 -2
#5 Control-Treat1 3 2013 3
#6 Control-Treat2 3 2013 5
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
add a comment |
We can use dplyr
, group_by
Block
and subtract Value
where Treatment == "Control"
from each Value
and remove the "Control" rows.
library(dplyr)
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value) %>%
filter(Treatment != "Control")
# Treatment Block Year Value
# <fct> <int> <int> <dbl>
#1 Treat1 1 2011 -6
#2 Treat2 1 2011 2
#3 Treat1 2 2012 -6
#4 Treat2 2 2012 -2
#5 Treat1 3 2013 3
#6 Treat2 3 2013 5
Not sure, if the values in Treatment
column in expected output (Control-Treat1
, Control-Treat2
) are shown only for demonstration purpose of the calculation or OP really wants that as output. In case if that is needed as output we can use
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value,
Treatment = paste0("Control-", Treatment)) %>%
filter(Treatment != "Control-Control")
# Treatment Block Year Value
# <chr> <int> <int> <dbl>
#1 Control-Treat1 1 2011 -6
#2 Control-Treat2 1 2011 2
#3 Control-Treat1 2 2012 -6
#4 Control-Treat2 2 2012 -2
#5 Control-Treat1 3 2013 3
#6 Control-Treat2 3 2013 5
We can use dplyr
, group_by
Block
and subtract Value
where Treatment == "Control"
from each Value
and remove the "Control" rows.
library(dplyr)
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value) %>%
filter(Treatment != "Control")
# Treatment Block Year Value
# <fct> <int> <int> <dbl>
#1 Treat1 1 2011 -6
#2 Treat2 1 2011 2
#3 Treat1 2 2012 -6
#4 Treat2 2 2012 -2
#5 Treat1 3 2013 3
#6 Treat2 3 2013 5
Not sure, if the values in Treatment
column in expected output (Control-Treat1
, Control-Treat2
) are shown only for demonstration purpose of the calculation or OP really wants that as output. In case if that is needed as output we can use
df %>%
group_by(Block) %>%
mutate(Value = Value[which.max(Treatment == "Control")] - Value,
Treatment = paste0("Control-", Treatment)) %>%
filter(Treatment != "Control-Control")
# Treatment Block Year Value
# <chr> <int> <int> <dbl>
#1 Control-Treat1 1 2011 -6
#2 Control-Treat2 1 2011 2
#3 Control-Treat1 2 2012 -6
#4 Control-Treat2 2 2012 -2
#5 Control-Treat1 3 2013 3
#6 Control-Treat2 3 2013 5
edited Mar 22 at 12:11
answered Mar 22 at 12:04
Ronak ShahRonak Shah
48.6k104370
48.6k104370
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
add a comment |
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
Exactly what I was looking for, thank you very much!
– Lucas
Mar 22 at 14:43
add a comment |
A somehow different tidyverse
possibility could be:
df %>%
spread(Treatment, Value) %>%
gather(var, val, -c(Block, Year, Control)) %>%
mutate(Value = Control - val,
Treatment = paste("Control", var, sep = " - ")) %>%
select(Treatment, Block, Year, Value) %>%
arrange(Block)
Treatment Block Year Value
1 Control - Treat1 1 2011 -6
2 Control - Treat2 1 2011 2
3 Control - Treat1 2 2012 -6
4 Control - Treat2 2 2012 -2
5 Control - Treat1 3 2013 3
6 Control - Treat2 3 2013 5
add a comment |
A somehow different tidyverse
possibility could be:
df %>%
spread(Treatment, Value) %>%
gather(var, val, -c(Block, Year, Control)) %>%
mutate(Value = Control - val,
Treatment = paste("Control", var, sep = " - ")) %>%
select(Treatment, Block, Year, Value) %>%
arrange(Block)
Treatment Block Year Value
1 Control - Treat1 1 2011 -6
2 Control - Treat2 1 2011 2
3 Control - Treat1 2 2012 -6
4 Control - Treat2 2 2012 -2
5 Control - Treat1 3 2013 3
6 Control - Treat2 3 2013 5
add a comment |
A somehow different tidyverse
possibility could be:
df %>%
spread(Treatment, Value) %>%
gather(var, val, -c(Block, Year, Control)) %>%
mutate(Value = Control - val,
Treatment = paste("Control", var, sep = " - ")) %>%
select(Treatment, Block, Year, Value) %>%
arrange(Block)
Treatment Block Year Value
1 Control - Treat1 1 2011 -6
2 Control - Treat2 1 2011 2
3 Control - Treat1 2 2012 -6
4 Control - Treat2 2 2012 -2
5 Control - Treat1 3 2013 3
6 Control - Treat2 3 2013 5
A somehow different tidyverse
possibility could be:
df %>%
spread(Treatment, Value) %>%
gather(var, val, -c(Block, Year, Control)) %>%
mutate(Value = Control - val,
Treatment = paste("Control", var, sep = " - ")) %>%
select(Treatment, Block, Year, Value) %>%
arrange(Block)
Treatment Block Year Value
1 Control - Treat1 1 2011 -6
2 Control - Treat2 1 2011 2
3 Control - Treat1 2 2012 -6
4 Control - Treat2 2 2012 -2
5 Control - Treat1 3 2013 3
6 Control - Treat2 3 2013 5
answered Mar 22 at 12:14
tmfmnktmfmnk
4,2061516
4,2061516
add a comment |
add a comment |
This can be done with an SQL self join like this:
library(sqldf)
sqldf("select a.Treatment || '-' || b.Treatment as Treatment,
a.Block,
a.Year,
a.Value - b.Value as Value
from df a
join df b on a.block = b.block and
a.Treatment = 'Control' and
b.Treatment != 'Control'")
giving:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
add a comment |
This can be done with an SQL self join like this:
library(sqldf)
sqldf("select a.Treatment || '-' || b.Treatment as Treatment,
a.Block,
a.Year,
a.Value - b.Value as Value
from df a
join df b on a.block = b.block and
a.Treatment = 'Control' and
b.Treatment != 'Control'")
giving:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
add a comment |
This can be done with an SQL self join like this:
library(sqldf)
sqldf("select a.Treatment || '-' || b.Treatment as Treatment,
a.Block,
a.Year,
a.Value - b.Value as Value
from df a
join df b on a.block = b.block and
a.Treatment = 'Control' and
b.Treatment != 'Control'")
giving:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
This can be done with an SQL self join like this:
library(sqldf)
sqldf("select a.Treatment || '-' || b.Treatment as Treatment,
a.Block,
a.Year,
a.Value - b.Value as Value
from df a
join df b on a.block = b.block and
a.Treatment = 'Control' and
b.Treatment != 'Control'")
giving:
Treatment Block Year Value
1 Control-Treat1 1 2011 -6
2 Control-Treat2 1 2011 2
3 Control-Treat1 2 2012 -6
4 Control-Treat2 2 2012 -2
5 Control-Treat1 3 2013 3
6 Control-Treat2 3 2013 5
edited Mar 25 at 13:12
answered Mar 22 at 13:25
G. GrothendieckG. Grothendieck
154k11137245
154k11137245
add a comment |
add a comment |
Another dplyr
-tidyr
approach: You can remove unwanted columns with select
:
library(tidyr)
library(dplyr)
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
group_by(Block,Year,key) %>%
mutate(Val=Control-value)
# A tibble: 6 x 6
# Groups: Block, Year, key [6]
Block Year Control key value Val
<int> <int> <dbl> <chr> <dbl> <dbl>
1 1 2011 6 Treat1 12 -6
2 2 2012 3 Treat1 9 -6
3 3 2013 6 Treat1 3 3
4 1 2011 6 Treat2 4 2
5 2 2012 3 Treat2 5 -2
6 3 2013 6 Treat2 1 5
Just the exact output:
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
mutate(Treatment=paste0("Control-",key)) %>%
group_by(Block,Year,Treatment) %>%
mutate(Val=Control-value) %>%
select(Treatment,everything(),-value,-key)%>%
arrange(Year)
Result:
# A tibble: 6 x 5
# Groups: Block, Year, Treatment [6]
Treatment Block Year Control Val
<chr> <int> <int> <dbl> <dbl>
1 Control-Treat1 1 2011 6 -6
2 Control-Treat2 1 2011 6 2
3 Control-Treat1 2 2012 3 -6
4 Control-Treat2 2 2012 3 -2
5 Control-Treat1 3 2013 6 3
6 Control-Treat2 3 2013 6 5
add a comment |
Another dplyr
-tidyr
approach: You can remove unwanted columns with select
:
library(tidyr)
library(dplyr)
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
group_by(Block,Year,key) %>%
mutate(Val=Control-value)
# A tibble: 6 x 6
# Groups: Block, Year, key [6]
Block Year Control key value Val
<int> <int> <dbl> <chr> <dbl> <dbl>
1 1 2011 6 Treat1 12 -6
2 2 2012 3 Treat1 9 -6
3 3 2013 6 Treat1 3 3
4 1 2011 6 Treat2 4 2
5 2 2012 3 Treat2 5 -2
6 3 2013 6 Treat2 1 5
Just the exact output:
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
mutate(Treatment=paste0("Control-",key)) %>%
group_by(Block,Year,Treatment) %>%
mutate(Val=Control-value) %>%
select(Treatment,everything(),-value,-key)%>%
arrange(Year)
Result:
# A tibble: 6 x 5
# Groups: Block, Year, Treatment [6]
Treatment Block Year Control Val
<chr> <int> <int> <dbl> <dbl>
1 Control-Treat1 1 2011 6 -6
2 Control-Treat2 1 2011 6 2
3 Control-Treat1 2 2012 3 -6
4 Control-Treat2 2 2012 3 -2
5 Control-Treat1 3 2013 6 3
6 Control-Treat2 3 2013 6 5
add a comment |
Another dplyr
-tidyr
approach: You can remove unwanted columns with select
:
library(tidyr)
library(dplyr)
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
group_by(Block,Year,key) %>%
mutate(Val=Control-value)
# A tibble: 6 x 6
# Groups: Block, Year, key [6]
Block Year Control key value Val
<int> <int> <dbl> <chr> <dbl> <dbl>
1 1 2011 6 Treat1 12 -6
2 2 2012 3 Treat1 9 -6
3 3 2013 6 Treat1 3 3
4 1 2011 6 Treat2 4 2
5 2 2012 3 Treat2 5 -2
6 3 2013 6 Treat2 1 5
Just the exact output:
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
mutate(Treatment=paste0("Control-",key)) %>%
group_by(Block,Year,Treatment) %>%
mutate(Val=Control-value) %>%
select(Treatment,everything(),-value,-key)%>%
arrange(Year)
Result:
# A tibble: 6 x 5
# Groups: Block, Year, Treatment [6]
Treatment Block Year Control Val
<chr> <int> <int> <dbl> <dbl>
1 Control-Treat1 1 2011 6 -6
2 Control-Treat2 1 2011 6 2
3 Control-Treat1 2 2012 3 -6
4 Control-Treat2 2 2012 3 -2
5 Control-Treat1 3 2013 6 3
6 Control-Treat2 3 2013 6 5
Another dplyr
-tidyr
approach: You can remove unwanted columns with select
:
library(tidyr)
library(dplyr)
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
group_by(Block,Year,key) %>%
mutate(Val=Control-value)
# A tibble: 6 x 6
# Groups: Block, Year, key [6]
Block Year Control key value Val
<int> <int> <dbl> <chr> <dbl> <dbl>
1 1 2011 6 Treat1 12 -6
2 2 2012 3 Treat1 9 -6
3 3 2013 6 Treat1 3 3
4 1 2011 6 Treat2 4 2
5 2 2012 3 Treat2 5 -2
6 3 2013 6 Treat2 1 5
Just the exact output:
dummy_df %>%
spread(Treatment,Value) %>%
gather(key,value,Treat1:Treat2) %>%
mutate(Treatment=paste0("Control-",key)) %>%
group_by(Block,Year,Treatment) %>%
mutate(Val=Control-value) %>%
select(Treatment,everything(),-value,-key)%>%
arrange(Year)
Result:
# A tibble: 6 x 5
# Groups: Block, Year, Treatment [6]
Treatment Block Year Control Val
<chr> <int> <int> <dbl> <dbl>
1 Control-Treat1 1 2011 6 -6
2 Control-Treat2 1 2011 6 2
3 Control-Treat1 2 2012 3 -6
4 Control-Treat2 2 2012 3 -2
5 Control-Treat1 3 2013 6 3
6 Control-Treat2 3 2013 6 5
edited Mar 22 at 12:19
answered Mar 22 at 12:07
NelsonGonNelsonGon
4,2464834
4,2464834
add a comment |
add a comment |
Another tidyverse
solution. We can use filter
to separate "Control" and "Treatment" to different data frames, use left_join
to combine them by Block
and Year
, and then process the data frame.
library(tidyverse)
df2 <- df %>%
filter(!Treatment %in% "Control") %>%
left_join(df %>% filter(Treatment %in% "Control"),
.,
by = c("Block", "Year")) %>%
mutate(Value = Value.x - Value.y) %>%
unite(Treatment, Treatment.x, Treatment.y, sep = "-") %>%
select(names(df))
# Treatment Block Year Value
# 1 Control-Treat1 1 2011 -6
# 2 Control-Treat2 1 2011 2
# 3 Control-Treat1 2 2012 -6
# 4 Control-Treat2 2 2012 -2
# 5 Control-Treat1 3 2013 3
# 6 Control-Treat2 3 2013 5
add a comment |
Another tidyverse
solution. We can use filter
to separate "Control" and "Treatment" to different data frames, use left_join
to combine them by Block
and Year
, and then process the data frame.
library(tidyverse)
df2 <- df %>%
filter(!Treatment %in% "Control") %>%
left_join(df %>% filter(Treatment %in% "Control"),
.,
by = c("Block", "Year")) %>%
mutate(Value = Value.x - Value.y) %>%
unite(Treatment, Treatment.x, Treatment.y, sep = "-") %>%
select(names(df))
# Treatment Block Year Value
# 1 Control-Treat1 1 2011 -6
# 2 Control-Treat2 1 2011 2
# 3 Control-Treat1 2 2012 -6
# 4 Control-Treat2 2 2012 -2
# 5 Control-Treat1 3 2013 3
# 6 Control-Treat2 3 2013 5
add a comment |
Another tidyverse
solution. We can use filter
to separate "Control" and "Treatment" to different data frames, use left_join
to combine them by Block
and Year
, and then process the data frame.
library(tidyverse)
df2 <- df %>%
filter(!Treatment %in% "Control") %>%
left_join(df %>% filter(Treatment %in% "Control"),
.,
by = c("Block", "Year")) %>%
mutate(Value = Value.x - Value.y) %>%
unite(Treatment, Treatment.x, Treatment.y, sep = "-") %>%
select(names(df))
# Treatment Block Year Value
# 1 Control-Treat1 1 2011 -6
# 2 Control-Treat2 1 2011 2
# 3 Control-Treat1 2 2012 -6
# 4 Control-Treat2 2 2012 -2
# 5 Control-Treat1 3 2013 3
# 6 Control-Treat2 3 2013 5
Another tidyverse
solution. We can use filter
to separate "Control" and "Treatment" to different data frames, use left_join
to combine them by Block
and Year
, and then process the data frame.
library(tidyverse)
df2 <- df %>%
filter(!Treatment %in% "Control") %>%
left_join(df %>% filter(Treatment %in% "Control"),
.,
by = c("Block", "Year")) %>%
mutate(Value = Value.x - Value.y) %>%
unite(Treatment, Treatment.x, Treatment.y, sep = "-") %>%
select(names(df))
# Treatment Block Year Value
# 1 Control-Treat1 1 2011 -6
# 2 Control-Treat2 1 2011 2
# 3 Control-Treat1 2 2012 -6
# 4 Control-Treat2 2 2012 -2
# 5 Control-Treat1 3 2013 3
# 6 Control-Treat2 3 2013 5
answered Mar 22 at 13:59
wwwwww
28.9k112345
28.9k112345
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55299174%2fsubtract-rows-varying-one-column-but-keeping-others-fixed%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown