lm(formula) in R behaves differently within parLapplyWhat is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply
Why can't miners meet the difficulty by picking a low number for the block hash?
Can I lend a small amount of my own money to a bank at the federal funds rate?
Is there an in-universe explanation given to the senior Imperial Navy Officers as to why Darth Vader serves Emperor Palpatine?
Where should I draw the line on follow up questions from previous employer
How can I throw a body?
How to save money by shopping at a variety of grocery stores?
Did ancient peoples ever hide their treasure behind puzzles?
What's the difference between a variable and a memory location?
How can I observe Sgr A* with itelescope.net
In what language did Túrin converse with Mím?
“I hope he visit us more often” Why is this wrong?
What checks exist against overuse of presidential pardons in the USA?
Do universities maintain secret textbooks?
What is this "opened" cube called?
How to handle inventory and story of a player leaving
Why do motor drives have multiple bus capacitors of small value capacitance instead of a single bus capacitor of large value?
Why is "I let him to sleep" incorrect (or is it)?
How do I portray irrational anger in first person?
Why is 3/4 a simple meter while 6/8 is a compound meter?
Necessity of tenure for lifetime academic research
Why didn't Doc believe Marty was from the future?
Get contents before a colon
How can I fix cracks between the bathtub and the wall surround?
Was a six-engine 747 ever seriously considered by Boeing?
lm(formula) in R behaves differently within parLapply
What is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
First I create a pair of example dataframes:
df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)
If I create a formula as a text string, I can plug it right into lm()
> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788
But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:
> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found
But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"
[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"
[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"
[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"
[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"
[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"
[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"
[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"
[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"
[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"
What embarrassingly trivial detail am I missing here?
r parallel-processing apply lm
add a comment |
First I create a pair of example dataframes:
df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)
If I create a formula as a text string, I can plug it right into lm()
> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788
But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:
> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found
But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"
[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"
[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"
[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"
[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"
[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"
[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"
[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"
[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"
[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"
What embarrassingly trivial detail am I missing here?
r parallel-processing apply lm
I have very little experience withparLapply
, but I can tell you that you should not be building formulas with$
at all. The formula should contain the variable names of columns in thedata
argument. e.g.lm(var1 ~ var2 + var3,data = my_data)
wherevar1
, etc are all columns inmy_data
.
– joran
Mar 27 at 21:04
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19
add a comment |
First I create a pair of example dataframes:
df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)
If I create a formula as a text string, I can plug it right into lm()
> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788
But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:
> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found
But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"
[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"
[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"
[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"
[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"
[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"
[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"
[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"
[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"
[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"
What embarrassingly trivial detail am I missing here?
r parallel-processing apply lm
First I create a pair of example dataframes:
df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)
If I create a formula as a text string, I can plug it right into lm()
> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788
But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:
> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found
But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"
[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"
[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"
[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"
[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"
[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"
[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"
[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"
[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"
[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"
What embarrassingly trivial detail am I missing here?
r parallel-processing apply lm
r parallel-processing apply lm
asked Mar 27 at 21:00
StonecraftStonecraft
3802 silver badges13 bronze badges
3802 silver badges13 bronze badges
I have very little experience withparLapply
, but I can tell you that you should not be building formulas with$
at all. The formula should contain the variable names of columns in thedata
argument. e.g.lm(var1 ~ var2 + var3,data = my_data)
wherevar1
, etc are all columns inmy_data
.
– joran
Mar 27 at 21:04
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19
add a comment |
I have very little experience withparLapply
, but I can tell you that you should not be building formulas with$
at all. The formula should contain the variable names of columns in thedata
argument. e.g.lm(var1 ~ var2 + var3,data = my_data)
wherevar1
, etc are all columns inmy_data
.
– joran
Mar 27 at 21:04
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19
I have very little experience with
parLapply
, but I can tell you that you should not be building formulas with $
at all. The formula should contain the variable names of columns in the data
argument. e.g. lm(var1 ~ var2 + var3,data = my_data)
where var1
, etc are all columns in my_data
.– joran
Mar 27 at 21:04
I have very little experience with
parLapply
, but I can tell you that you should not be building formulas with $
at all. The formula should contain the variable names of columns in the data
argument. e.g. lm(var1 ~ var2 + var3,data = my_data)
where var1
, etc are all columns in my_data
.– joran
Mar 27 at 21:04
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19
add a comment |
2 Answers
2
active
oldest
votes
I would do it this way; note the radically different data organization:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But usingparLapply
I would have zero assumptions about what each process can access from your global environment.
– joran
Mar 27 at 21:30
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment liketraits2 <- traits
, R has a process for looking fortraits
that includes checking argument to the function you're inside. But remember that once you calllm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks thedata
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.
– joran
Mar 27 at 21:59
|
show 3 more comments
OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula
in my parLapply
, and also forgot to change variable name vector to row and to transpose it.
So. The following works just dandy:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386373%2flmformula-in-r-behaves-differently-within-parlapply%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would do it this way; note the radically different data organization:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But usingparLapply
I would have zero assumptions about what each process can access from your global environment.
– joran
Mar 27 at 21:30
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment liketraits2 <- traits
, R has a process for looking fortraits
that includes checking argument to the function you're inside. But remember that once you calllm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks thedata
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.
– joran
Mar 27 at 21:59
|
show 3 more comments
I would do it this way; note the radically different data organization:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But usingparLapply
I would have zero assumptions about what each process can access from your global environment.
– joran
Mar 27 at 21:30
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment liketraits2 <- traits
, R has a process for looking fortraits
that includes checking argument to the function you're inside. But remember that once you calllm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks thedata
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.
– joran
Mar 27 at 21:59
|
show 3 more comments
I would do it this way; note the radically different data organization:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
I would do it this way; note the radically different data organization:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
answered Mar 27 at 21:20
joranjoran
141k22 gold badges350 silver badges399 bronze badges
141k22 gold badges350 silver badges399 bronze badges
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But usingparLapply
I would have zero assumptions about what each process can access from your global environment.
– joran
Mar 27 at 21:30
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment liketraits2 <- traits
, R has a process for looking fortraits
that includes checking argument to the function you're inside. But remember that once you calllm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks thedata
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.
– joran
Mar 27 at 21:59
|
show 3 more comments
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But usingparLapply
I would have zero assumptions about what each process can access from your global environment.
– joran
Mar 27 at 21:30
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment liketraits2 <- traits
, R has a process for looking fortraits
that includes checking argument to the function you're inside. But remember that once you calllm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks thedata
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.
– joran
Mar 27 at 21:59
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.
– Stonecraft
Mar 27 at 21:22
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.
– joran
Mar 27 at 21:25
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.
– Stonecraft
Mar 27 at 21:26
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using
parLapply
I would have zero assumptions about what each process can access from your global environment.– joran
Mar 27 at 21:30
@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using
parLapply
I would have zero assumptions about what each process can access from your global environment.– joran
Mar 27 at 21:30
1
1
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like
traits2 <- traits
, R has a process for looking for traits
that includes checking argument to the function you're inside. But remember that once you call lm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.– joran
Mar 27 at 21:59
@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like
traits2 <- traits
, R has a process for looking for traits
that includes checking argument to the function you're inside. But remember that once you call lm()
you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data
argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.– joran
Mar 27 at 21:59
|
show 3 more comments
OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula
in my parLapply
, and also forgot to change variable name vector to row and to transpose it.
So. The following works just dandy:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
add a comment |
OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula
in my parLapply
, and also forgot to change variable name vector to row and to transpose it.
So. The following works just dandy:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
add a comment |
OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula
in my parLapply
, and also forgot to change variable name vector to row and to transpose it.
So. The following works just dandy:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)
OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula
in my parLapply
, and also forgot to change variable name vector to row and to transpose it.
So. The following works just dandy:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)
answered Mar 27 at 22:14
StonecraftStonecraft
3802 silver badges13 bronze badges
3802 silver badges13 bronze badges
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
add a comment |
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.
– Stonecraft
Mar 28 at 3:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386373%2flmformula-in-r-behaves-differently-within-parlapply%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I have very little experience with
parLapply
, but I can tell you that you should not be building formulas with$
at all. The formula should contain the variable names of columns in thedata
argument. e.g.lm(var1 ~ var2 + var3,data = my_data)
wherevar1
, etc are all columns inmy_data
.– joran
Mar 27 at 21:04
same thing happens if I index by name
– Stonecraft
Mar 27 at 21:10
Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.
– joran
Mar 27 at 21:12
Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.
– Stonecraft
Mar 27 at 21:16
It's totally doable, and much cleaner...I'll post an answer.
– joran
Mar 27 at 21:19