Spark Scheduler pool jobs are not running parallel as I expected The 2019 Stack Overflow Developer Survey Results Are InWhat is Spark Job ?Apache Spark - How does internal job scheduler in spark define what are users and what are poolsRunning scheduled Spark jobSubmitting Spark Job On Scheduler PoolSpark Job Scheduling Running 2 jobs ConcurrentlyParallel Jobs -Windows SchedulerRunning spark job in parallelScheduling a Spark Streaming JobHow to run multiple Spark jobs in parallel?schedule spark job in spark-shell
How to type this arrow in math mode?
Worn-tile Scrabble
Why didn't the Event Horizon Telescope team mention Sagittarius A*?
What could be the right powersource for 15 seconds lifespan disposable giant chainsaw?
If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?
Why is the maximum length of OpenWrt’s root password 8 characters?
Is there any way to tell whether the shot is going to hit you or not?
Shouldn't "much" here be used instead of "more"?
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Earliest use of the term "Galois extension"?
Write faster on AT24C32
Is "plugging out" electronic devices an American expression?
Can one be advised by a professor who is very far away?
Who coined the term "madman theory"?
Apparent duplicates between Haynes service instructions and MOT
What is the meaning of the verb "bear" in this context?
What does ひと匙 mean in this manga and has it been used colloquially?
Can a rogue use sneak attack with weapons that have the thrown property even if they are not thrown?
How can I autofill dates in Excel excluding Sunday?
Are there incongruent pythagorean triangles with the same perimeter and same area?
Did 3000BC Egyptians use meteoric iron weapons?
What is the accessibility of a package's `Private` context variables?
What did it mean to "align" a radio?
Why do we hear so much about the Trump administration deciding to impose and then remove tariffs?
Spark Scheduler pool jobs are not running parallel as I expected
The 2019 Stack Overflow Developer Survey Results Are InWhat is Spark Job ?Apache Spark - How does internal job scheduler in spark define what are users and what are poolsRunning scheduled Spark jobSubmitting Spark Job On Scheduler PoolSpark Job Scheduling Running 2 jobs ConcurrentlyParallel Jobs -Windows SchedulerRunning spark job in parallelScheduling a Spark Streaming JobHow to run multiple Spark jobs in parallel?schedule spark job in spark-shell
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to run two spark actions as below and I expect them to run parallely as they both use differenct pools. Does scheduling using pools meant that, different independent actions will run parallelly? I mean If I have 200 cores, then pool1 uses 100 cores and pool2 uses 100 cores and then process the action.
In my case after first dataframe action is completed in pool1 then dataframe action2 is started.
spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)
spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)
My pool configuration xml
<?xml version="1.0"?>
<allocations>
<pool name="pool1">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
<pool name="pool2">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
</allocations>
apache-spark job-scheduling
add a comment |
I am trying to run two spark actions as below and I expect them to run parallely as they both use differenct pools. Does scheduling using pools meant that, different independent actions will run parallelly? I mean If I have 200 cores, then pool1 uses 100 cores and pool2 uses 100 cores and then process the action.
In my case after first dataframe action is completed in pool1 then dataframe action2 is started.
spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)
spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)
My pool configuration xml
<?xml version="1.0"?>
<allocations>
<pool name="pool1">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
<pool name="pool2">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
</allocations>
apache-spark job-scheduling
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20
add a comment |
I am trying to run two spark actions as below and I expect them to run parallely as they both use differenct pools. Does scheduling using pools meant that, different independent actions will run parallelly? I mean If I have 200 cores, then pool1 uses 100 cores and pool2 uses 100 cores and then process the action.
In my case after first dataframe action is completed in pool1 then dataframe action2 is started.
spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)
spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)
My pool configuration xml
<?xml version="1.0"?>
<allocations>
<pool name="pool1">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
<pool name="pool2">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
</allocations>
apache-spark job-scheduling
I am trying to run two spark actions as below and I expect them to run parallely as they both use differenct pools. Does scheduling using pools meant that, different independent actions will run parallelly? I mean If I have 200 cores, then pool1 uses 100 cores and pool2 uses 100 cores and then process the action.
In my case after first dataframe action is completed in pool1 then dataframe action2 is started.
spark.setLocalProperty("spark.scheduler.pool","pool1")
dataframe.show(100,false)
spark.setLocalProperty("spark.scheduler.pool","pool2")
dataframe2.show(100,false)
My pool configuration xml
<?xml version="1.0"?>
<allocations>
<pool name="pool1">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
<pool name="pool2">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
</pool>
</allocations>
apache-spark job-scheduling
apache-spark job-scheduling
asked Mar 22 at 3:39
user7481861user7481861
153
153
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20
add a comment |
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20
add a comment |
1 Answer
1
active
oldest
votes
As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,
Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.
I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?
TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292547%2fspark-scheduler-pool-jobs-are-not-running-parallel-as-i-expected%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,
Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.
I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?
TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
add a comment |
As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,
Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.
I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?
TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
add a comment |
As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,
Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.
I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?
TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
As per given details, your job must run parallely based on spark configuration but there are few parameters which need to be considered,
Is YARN your cluster manager ? and if it is then have you configured the pool in configuration in YARN.
I can see you are using FAIR scheduler which means scheduler is being overridden then have configured the same in YARN ?
TO configured FAIR scheduler please go through below link, everything is given in details,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
answered Mar 22 at 5:43
Dhrub ThakurDhrub Thakur
114
114
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292547%2fspark-scheduler-pool-jobs-are-not-running-parallel-as-i-expected%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Have you set conf property spark.scheduler.allocation.file to pool configuration xml? conf.set("spark.scheduler.allocation.file", "/path/to/file")
– Anurag Sharma
Mar 22 at 9:20