Spark Scheduler pool jobs are not running parallel as I expected The 2019 Stack Overflow Developer Survey Results Are InWhat is Spark Job ?Apache Spark - How does internal job scheduler in spark define what are users and what are poolsRunning scheduled Spark jobSubmitting Spark Job On Scheduler PoolSpark Job Scheduling Running 2 jobs ConcurrentlyParallel Jobs -Windows SchedulerRunning spark job in parallelScheduling a Spark Streaming JobHow to run multiple Spark jobs in parallel?schedule spark job in spark-shell

How to type this arrow in math mode?

Worn-tile Scrabble

Why didn't the Event Horizon Telescope team mention Sagittarius A*?

What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Why is the maximum length of OpenWrt’s root password 8 characters?

Is there any way to tell whether the shot is going to hit you or not?

Shouldn't "much" here be used instead of "more"?

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

Earliest use of the term "Galois extension"?

Write faster on AT24C32

Is "plugging out" electronic devices an American expression?

Can one be advised by a professor who is very far away?

Who coined the term "madman theory"?

Apparent duplicates between Haynes service instructions and MOT

What is the meaning of the verb "bear" in this context?

What does ひと匙 mean in this manga and has it been used colloquially?

Can a rogue use sneak attack with weapons that have the thrown property even if they are not thrown?

How can I autofill dates in Excel excluding Sunday?

Are there incongruent pythagorean triangles with the same perimeter and same area?

Did 3000BC Egyptians use meteoric iron weapons?

What is the accessibility of a package's `Private` context variables?

What did it mean to "align" a radio?

Why do we hear so much about the Trump administration deciding to impose and then remove tariffs?

Spark Scheduler pool jobs are not running parallel as I expected

The 2019 Stack Overflow Developer Survey Results Are InWhat is Spark Job ?Apache Spark - How does internal job scheduler in spark define what are users and what are poolsRunning scheduled Spark jobSubmitting Spark Job On Scheduler PoolSpark Job Scheduling Running 2 jobs ConcurrentlyParallel Jobs -Windows SchedulerRunning spark job in parallelScheduling a Spark Streaming JobHow to run multiple Spark jobs in parallel?schedule spark job in spark-shell

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

-1

I am trying to run two spark actions as below and I expect them to run parallely as they both use differenct pools. Does scheduling using pools meant that, different independent actions will run parallelly? I mean If I have 200 cores, then pool1 uses 100 cores and pool2 uses 100 cores and then process the action.
In my case after first dataframe action is completed in pool1 then dataframe action2 is started.

spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)

spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)

My pool configuration xml

<?xml version="1.0"?>

<allocations>
 <pool name="pool1">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
 <pool name="pool2">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
</allocations>

asked Mar 22 at 3:39

user7481861

153

Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")

– Anurag Sharma
Mar 22 at 9:20

add a comment |

-1

spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)

spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)

My pool configuration xml

<?xml version="1.0"?>

<allocations>
 <pool name="pool1">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
 <pool name="pool2">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
</allocations>

asked Mar 22 at 3:39

user7481861

153

Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")

– Anurag Sharma
Mar 22 at 9:20

add a comment |

-1

spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)

spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)

My pool configuration xml

<?xml version="1.0"?>

<allocations>
 <pool name="pool1">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
 <pool name="pool2">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
</allocations>

asked Mar 22 at 3:39

user7481861

153

spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)

spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)

My pool configuration xml

<?xml version="1.0"?>

<allocations>
 <pool name="pool1">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
 <pool name="pool2">
 <schedulingMode>FAIR</schedulingMode>
 <weight>1</weight>
 </pool>
</allocations>

apache-spark job-scheduling

asked Mar 22 at 3:39

user7481861

153

asked Mar 22 at 3:39

user7481861

153

asked Mar 22 at 3:39

user7481861

153

asked Mar 22 at 3:39

user7481861

153

asked Mar 22 at 3:39

user7481861

153

Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")

– Anurag Sharma
Mar 22 at 9:20

add a comment |

Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")

– Anurag Sharma
Mar 22 at 9:20

Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")

– Anurag Sharma
Mar 22 at 9:20

add a comment |

1 Answer
1

active

oldest

votes

As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,

Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.

I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?

TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

answered Mar 22 at 5:43

Dhrub Thakur

114

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292547%2fspark-scheduler-pool-jobs-are-not-running-parallel-as-i-expected%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,

Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.

I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?

TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

answered Mar 22 at 5:43

Dhrub Thakur

114

add a comment |

As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,

Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.

I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?

TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

answered Mar 22 at 5:43

Dhrub Thakur

114

add a comment |

As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,

Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.

I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?

TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

answered Mar 22 at 5:43

Dhrub Thakur

114

As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,

Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.

I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?

TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

answered Mar 22 at 5:43

Dhrub Thakur

114

answered Mar 22 at 5:43

Dhrub Thakur

114

answered Mar 22 at 5:43

Dhrub Thakur

114

answered Mar 22 at 5:43

Dhrub Thakur

114

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1