lm(formula) in R behaves differently within parLapplyWhat is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply

Why can't miners meet the difficulty by picking a low number for the block hash?

Can I lend a small amount of my own money to a bank at the federal funds rate?

Is there an in-universe explanation given to the senior Imperial Navy Officers as to why Darth Vader serves Emperor Palpatine?

Where should I draw the line on follow up questions from previous employer

How can I throw a body?

How to save money by shopping at a variety of grocery stores?

Did ancient peoples ever hide their treasure behind puzzles?

What's the difference between a variable and a memory location?

How can I observe Sgr A* with itelescope.net

In what language did Túrin converse with Mím?

“I hope he visit us more often” Why is this wrong?

What checks exist against overuse of presidential pardons in the USA?

Do universities maintain secret textbooks?

What is this "opened" cube called?

How to handle inventory and story of a player leaving

Why do motor drives have multiple bus capacitors of small value capacitance instead of a single bus capacitor of large value?

Why is "I let him to sleep" incorrect (or is it)?

How do I portray irrational anger in first person?

Why is 3/4 a simple meter while 6/8 is a compound meter?

Necessity of tenure for lifetime academic research

Why didn't Doc believe Marty was from the future?

Get contents before a colon

How can I fix cracks between the bathtub and the wall surround?

Was a six-engine 747 ever seriously considered by Boeing?

lm(formula) in R behaves differently within parLapply

What is the difference between concurrency and parallelism?What are the differences between “=” and “<-” in R?What is the difference between require() and library()?using parallel's parLapply: unable to access variables within parallel codeUsing parLapply and clusterExport inside a functionUnderstanding the differences between mclapply and parLapply in RparLapply within R6 classesPlyr based on which.min for hole data.frame with colwise not workingparLapply timeout optionSerialization within parLapply

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

First I create a pair of example dataframes:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

If I create a formula as a text string, I can plug it right into lm()

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
 (Intercept) traits$var1group2 
 0.7799 0.1788

But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
 9 nodes produced errors; first error: object 'traits' not found

But what is strange is that the "traits" argument IS making it into the parLapply as I am using, it seems to be something about the way lm() works that is the problem. I can input and return "traits" just fine:

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
 sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
 sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
 sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
 sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
 sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
 sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
 sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
 sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
 sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
 sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"

What embarrassingly trivial detail am I missing here?

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04

same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10

Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12

Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16

It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19

add a comment |

First I create a pair of example dataframes:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

If I create a formula as a text string, I can plug it right into lm()

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
 (Intercept) traits$var1group2 
 0.7799 0.1788

But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
 9 nodes produced errors; first error: object 'traits' not found

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
 sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
 sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
 sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
 sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
 sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
 sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
 sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
 sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
 sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
 sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"

What embarrassingly trivial detail am I missing here?

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04

same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10

Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12

Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16

It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19

add a comment |

First I create a pair of example dataframes:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

If I create a formula as a text string, I can plug it right into lm()

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
 (Intercept) traits$var1group2 
 0.7799 0.1788

But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
 9 nodes produced errors; first error: object 'traits' not found

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
 sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
 sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
 sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
 sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
 sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
 sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
 sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
 sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
 sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
 sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"

What embarrassingly trivial detail am I missing here?

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

First I create a pair of example dataframes:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

If I create a formula as a text string, I can plug it right into lm()

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod

Call:
lm(formula = as.formula(ModString))

Coefficients:
 (Intercept) traits$var1group2 
 0.7799 0.1788

But if I try to do the same thing with parLapply, I get an error indicating that the "traits" argument was not working as expected:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ , df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
 9 nodes produced errors; first error: object 'traits' not found

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ , df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[1]][[2]]
 sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609

[[1]][[3]]
[1] "vector ~ traits$factor1"


[[2]]
[[2]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[2]][[2]]
 sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196

[[2]][[3]]
[1] "vector ~ traits$factor1"


[[3]]
[[3]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[3]][[2]]
 sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525

[[3]][[3]]
[1] "vector ~ traits$factor1"


[[4]]
[[4]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[4]][[2]]
 sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714

[[4]][[3]]
[1] "vector ~ traits$factor1"


[[5]]
[[5]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[5]][[2]]
 sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647

[[5]][[3]]
[1] "vector ~ traits$factor1"


[[6]]
[[6]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[6]][[2]]
 sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039

[[6]][[3]]
[1] "vector ~ traits$factor1"


[[7]]
[[7]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[7]][[2]]
 sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802

[[7]][[3]]
[1] "vector ~ traits$factor1"


[[8]]
[[8]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[8]][[2]]
 sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582

[[8]][[3]]
[1] "vector ~ traits$factor1"


[[9]]
[[9]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[9]][[2]]
 sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921

[[9]][[3]]
[1] "vector ~ traits$factor1"


[[10]]
[[10]][[1]]
 var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2

[[10]][[2]]
 sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685

[[10]][[3]]
[1] "vector ~ traits$factor1"

What embarrassingly trivial detail am I missing here?

r parallel-processing apply lm

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

asked Mar 27 at 21:00

Stonecraft

3802 silver badges13 bronze badges

I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04

same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10

Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12

Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16

It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19

add a comment |

I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04

same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10

Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12

Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16

It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19

I have very little experience with parLapply, but I can tell you that you should not be building formulas with $ at all. The formula should contain the variable names of columns in the data argument. e.g. lm(var1 ~ var2 + var3,data = my_data) where var1, etc are all columns in my_data.

– joran
Mar 27 at 21:04

same thing happens if I index by name

– Stonecraft
Mar 27 at 21:10

Yeah, the problem is with your fundamental approach and the whole way that your data is organized. Minor tweaks while still holding the data in two different data frame one of which has columns as rows is just going to be so complicated to manage that you're going to run into lots of problems.

– joran
Mar 27 at 21:12

Unfortunately it is not avoidable in my situation, this was just the simplest example I could make.

– Stonecraft
Mar 27 at 21:16

It's totally doable, and much cleaner...I'll post an answer.

– joran
Mar 27 at 21:19

add a comment |

2 Answers
2

active

oldest

votes

I would do it this way; note the radically different data organization:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
 mutate(row = 1:n()) %>% 
 gather(key = sample,value = val,sample1:sample4) %>% 
 arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
 traits_new,
 by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
 lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

1

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

|
show 3 more comments

-1

OK I feel really silly but I'm going to leave the question up because it's a great example of how easy it is to get confused when copy-pasting and editing multiple versions of code. I did not consistently use as.formula in my parLapply, and also forgot to change variable name vector to row and to transpose it.

So. The following works just dandy:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
 row = t(df[i,])
 ModString = "row ~ traits[,"var1"]"
 Mod = lm(as.formula(ModString))
 return(Mod)
, df = df, traits = traits)

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55386373%2flmformula-in-r-behaves-differently-within-parlapply%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I would do it this way; note the radically different data organization:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
 mutate(row = 1:n()) %>% 
 gather(key = sample,value = val,sample1:sample4) %>% 
 arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
 traits_new,
 by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
 lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

1

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

|
show 3 more comments

I would do it this way; note the radically different data organization:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
 mutate(row = 1:n()) %>% 
 gather(key = sample,value = val,sample1:sample4) %>% 
 arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
 traits_new,
 by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
 lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

1

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

|
show 3 more comments

I would do it this way; note the radically different data organization:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
 mutate(row = 1:n()) %>% 
 gather(key = sample,value = val,sample1:sample4) %>% 
 arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
 traits_new,
 by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
 lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

I would do it this way; note the radically different data organization:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)

#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
 mutate(row = 1:n()) %>% 
 gather(key = sample,value = val,sample1:sample4) %>% 
 arrange(row,sample)

#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")

#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
 traits_new,
 by = "sample")

#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)

#Wrapper for lm with the only formula we need
fit_lm <- function(x)
 lm(val ~ var1,data = x)


num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)

results <- parLapply(cl = cl,df_new_split,fit_lm)

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

answered Mar 27 at 21:20

joran

141k22 gold badges350 silver badges399 bronze badges

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

1

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

|
show 3 more comments

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

1

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

Thanks, but I really need the data to be organized this way because of what I do with it downstream. My real question is why the string is interpreted differently in the two contexts.

– Stonecraft
Mar 27 at 21:22

@Thoughtcraft The reason is bc lm() isn't meant to be used in the way you are. The ability to pass variable names in formulae that are looked up in the calling environment (how the first attempt works) is meant for simple, interactive work, not programming. All the data transformations I show can be undone later.

– joran
Mar 27 at 21:25

Sure, but then why does lm work with the same string and data outside parLapply? This is what I want to figure out.

– Stonecraft
Mar 27 at 21:26

@Thoughtcraft Because when you aren't splitting things into multiple processes, R has a system for looking up name in successive environments. Even then, things could go very wrong if you're doing it inside nested functions. But using parLapply I would have zero assumptions about what each process can access from your global environment.

– joran
Mar 27 at 21:30

@Thoughtcraft Having access to, and knowing to look there are two different things. When you make an assignment like traits2 <- traits, R has a process for looking for traits that includes checking argument to the function you're inside. But remember that once you call lm() you've descended a whole layer of scope. Everything that happens there is in another whole new environment. And the way lm works is that it checks the data argument, and then it checks the calling environment; merely passing an argument doesn't create the actual argument in that environment.

– joran
Mar 27 at 21:59

|
show 3 more comments

-1

So. The following works just dandy:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
 row = t(df[i,])
 ModString = "row ~ traits[,"var1"]"
 Mod = lm(as.formula(ModString))
 return(Mod)
, df = df, traits = traits)

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

add a comment |

-1

So. The following works just dandy:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
 row = t(df[i,])
 ModString = "row ~ traits[,"var1"]"
 Mod = lm(as.formula(ModString))
 return(Mod)
, df = df, traits = traits)

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

add a comment |

-1

So. The following works just dandy:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
 row = t(df[i,])
 ModString = "row ~ traits[,"var1"]"
 Mod = lm(as.formula(ModString))
 return(Mod)
, df = df, traits = traits)

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

So. The following works just dandy:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits)
 row = t(df[i,])
 ModString = "row ~ traits[,"var1"]"
 Mod = lm(as.formula(ModString))
 return(Mod)
, df = df, traits = traits)

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

answered Mar 27 at 22:14

Stonecraft

3802 silver badges13 bronze badges

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

add a comment |

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

Don't get why this answer is not useful. It is the cause of my problem. Although I won't argue with anyone who downvotes the question, as it was rather poorly considered.

– Stonecraft
Mar 28 at 3:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2