lm(formula) in R behaves differently within parLapplyWhat is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply

Why can't miners meet the difficulty by picking a low number for the block hash?

Can I lend a small amount of my own money to a bank at the federal funds rate?

Is there an in-universe explanation given to the senior Imperial Navy Officers as to why Darth Vader serves Emperor Palpatine?

Where should I draw the line on follow up questions from previous employer

How can I throw a body?

How to save money by shopping at a variety of grocery stores?

Did ancient peoples ever hide their treasure behind puzzles?

What's the difference between a variable and a memory location?

How can I observe Sgr A* with itelescope.net

In what language did Túrin converse with Mím?

“I hope he visit us more often” Why is this wrong?

What checks exist against overuse of presidential pardons in the USA?

Do universities maintain secret textbooks?

What is this "opened" cube called?

How to handle inventory and story of a player leaving

Why do motor drives have multiple bus capacitors of small value capacitance instead of a single bus capacitor of large value?

Why is "I let him to sleep" incorrect (or is it)?

How do I portray irrational anger in first person?

Why is 3/4 a simple meter while 6/8 is a compound meter?

Necessity of tenure for lifetime academic research

Why didn't Doc believe Marty was from the future?

Get contents before a colon

How can I fix cracks between the bathtub and the wall surround?

Was a six-engine 747 ever seriously considered by Boeing?



lm(formula) in R behaves differently within parLapply


What is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















First I create a pair of example dataframes:



df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)


If I create a formula as a text string, I can plug it right into lm()



> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788


But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:



> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found


But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:



> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"


What embarrassingly trivial detail am I missing here?










share|improve this question
























  • I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

    – joran
    Mar 27 at 21:04











  • same thing happens if I index by name

    – Stonecraft
    Mar 27 at 21:10











  • Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

    – joran
    Mar 27 at 21:12











  • Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

    – Stonecraft
    Mar 27 at 21:16











  • It's totally doable, and much cleaner...I'll post an answer.

    – joran
    Mar 27 at 21:19

















0















First I create a pair of example dataframes:



df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)


If I create a formula as a text string, I can plug it right into lm()



> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788


But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:



> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found


But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:



> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"


What embarrassingly trivial detail am I missing here?










share|improve this question
























  • I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

    – joran
    Mar 27 at 21:04











  • same thing happens if I index by name

    – Stonecraft
    Mar 27 at 21:10











  • Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

    – joran
    Mar 27 at 21:12











  • Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

    – Stonecraft
    Mar 27 at 21:16











  • It's totally doable, and much cleaner...I'll post an answer.

    – joran
    Mar 27 at 21:19













0












0








0








First I create a pair of example dataframes:



df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)


If I create a formula as a text string, I can plug it right into lm()



> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788


But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:



> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found


But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:



> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"


What embarrassingly trivial detail am I missing here?










share|improve this question














First I create a pair of example dataframes:



df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)


If I create a formula as a text string, I can plug it right into lm()



> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788


But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:



> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found


But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:



> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"


What embarrassingly trivial detail am I missing here?







r parallel-processing apply lm






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 27 at 21:00









StonecraftStonecraft

3802 silver badges13 bronze badges




3802 silver badges13 bronze badges















  • I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

    – joran
    Mar 27 at 21:04











  • same thing happens if I index by name

    – Stonecraft
    Mar 27 at 21:10











  • Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

    – joran
    Mar 27 at 21:12











  • Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

    – Stonecraft
    Mar 27 at 21:16











  • It's totally doable, and much cleaner...I'll post an answer.

    – joran
    Mar 27 at 21:19

















  • I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

    – joran
    Mar 27 at 21:04











  • same thing happens if I index by name

    – Stonecraft
    Mar 27 at 21:10











  • Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

    – joran
    Mar 27 at 21:12











  • Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

    – Stonecraft
    Mar 27 at 21:16











  • It's totally doable, and much cleaner...I'll post an answer.

    – joran
    Mar 27 at 21:19
















I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04





I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04













same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10





same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10













Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12





Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12













Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16





Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16













It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19





It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19












2 Answers
2






active

oldest

votes


















1















I would do it this way; note the radically different data organization:



library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)





share|improve this answer

























  • Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

    – Stonecraft
    Mar 27 at 21:22











  • @Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

    – joran
    Mar 27 at 21:25











  • Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

    – Stonecraft
    Mar 27 at 21:26











  • @Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

    – joran
    Mar 27 at 21:30






  • 1





    @Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

    – joran
    Mar 27 at 21:59


















-1















OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.



So. The following works just dandy:



require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)





share|improve this answer

























  • Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

    – Stonecraft
    Mar 28 at 3:15













Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386373%2flmformula-in-r-behaves-differently-within-parlapply%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1















I would do it this way; note the radically different data organization:



library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)





share|improve this answer

























  • Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

    – Stonecraft
    Mar 27 at 21:22











  • @Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

    – joran
    Mar 27 at 21:25











  • Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

    – Stonecraft
    Mar 27 at 21:26











  • @Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

    – joran
    Mar 27 at 21:30






  • 1





    @Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

    – joran
    Mar 27 at 21:59















1















I would do it this way; note the radically different data organization:



library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)





share|improve this answer

























  • Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

    – Stonecraft
    Mar 27 at 21:22











  • @Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

    – joran
    Mar 27 at 21:25











  • Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

    – Stonecraft
    Mar 27 at 21:26











  • @Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

    – joran
    Mar 27 at 21:30






  • 1





    @Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

    – joran
    Mar 27 at 21:59













1














1










1









I would do it this way; note the radically different data organization:



library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)





share|improve this answer













I would do it this way; note the radically different data organization:



library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 27 at 21:20









joranjoran

141k22 gold badges350 silver badges399 bronze badges




141k22 gold badges350 silver badges399 bronze badges















  • Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

    – Stonecraft
    Mar 27 at 21:22











  • @Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

    – joran
    Mar 27 at 21:25











  • Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

    – Stonecraft
    Mar 27 at 21:26











  • @Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

    – joran
    Mar 27 at 21:30






  • 1





    @Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

    – joran
    Mar 27 at 21:59

















  • Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

    – Stonecraft
    Mar 27 at 21:22











  • @Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

    – joran
    Mar 27 at 21:25











  • Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

    – Stonecraft
    Mar 27 at 21:26











  • @Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

    – joran
    Mar 27 at 21:30






  • 1





    @Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

    – joran
    Mar 27 at 21:59
















Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22





Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22













@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25





@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25













Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26





Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26













@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30





@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30




1




1





@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59





@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59













-1















OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.



So. The following works just dandy:



require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)





share|improve this answer

























  • Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

    – Stonecraft
    Mar 28 at 3:15















-1















OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.



So. The following works just dandy:



require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)





share|improve this answer

























  • Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

    – Stonecraft
    Mar 28 at 3:15













-1














-1










-1









OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.



So. The following works just dandy:



require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)





share|improve this answer













OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.



So. The following works just dandy:



require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
, df = df, traits = traits)






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 27 at 22:14









StonecraftStonecraft

3802 silver badges13 bronze badges




3802 silver badges13 bronze badges















  • Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

    – Stonecraft
    Mar 28 at 3:15

















  • Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

    – Stonecraft
    Mar 28 at 3:15
















Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15





Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386373%2flmformula-in-r-behaves-differently-within-parlapply%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript