How to cache the left most table in memory for a left outer join in hiveINNER JOIN vs LEFT JOIN performance in SQL ServerRIGHT/LEFT OUTER JOIN perform differently in HIVE?Hive: Left outer join with rlike conditionNon equi Left outer join in hive workaroundhive left outer join long runningUnable to increase hive dynamic partitions in spark using spark-sqlIssue with Left Outer Join in HiveHive - OR condition with left outer joinMultiple left outer joins on HiveHive Map-Join configuration mistery
Why does sound not move though a wall?
The number of days until the end of the month
IP addresses from public IP block in my LAN
What does 'made on' mean here?
What does this wavy downward arrow preceding a piano chord mean?
Appropriate certificate to ask for a fibre installation (ANSI/TIA-568.3-D?)
Should I mention being denied entry to UK due to a confusion in my Visa and Ticket bookings?
Target/total memory is higher than max_server_memory
PWM 1Hz on solid state relay
Can my company stop me from working overtime?
Uniform boundedness of the number of number fields having fixed discriminant
How does this change to the opportunity attack rule impact combat?
I have a unique character that I'm having a problem writing. He's a virus!
A factorization game
Find the cheapest shipping option based on item weight
Is there an official reason for not adding a post-credits scene?
Why did Thanos need his ship to help him in the battle scene?
Upside-Down Pyramid Addition...REVERSED!
Why does this derived table improve performance?
How to increase the size of the cursor in Lubuntu 19.04?
I need a disease
Chapter style minimal design
Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?
Is the set of non invertible matrices simply connected? What are their homotopy and homology groups?
How to cache the left most table in memory for a left outer join in hive
INNER JOIN vs LEFT JOIN performance in SQL ServerRIGHT/LEFT OUTER JOIN perform differently in HIVE?Hive: Left outer join with rlike conditionNon equi Left outer join in hive workaroundhive left outer join long runningUnable to increase hive dynamic partitions in spark using spark-sqlIssue with Left Outer Join in HiveHive - OR condition with left outer joinMultiple left outer joins on HiveHive Map-Join configuration mistery
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records)
SELECT st.id
FROM small_table st
LEFT JOIN large_table lt
ON st.id = lt.id
In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory
Is there way to control the table that needs to be cached
Note: I cannot alter the table positions or the code:
Neither here...

...nor here...

Parameters used:
set hive.execution.engine=tez;
set hive.tez.container.size=4096;
set hive.merge.mapredfiles=true;
set tez.shuffle-vertex-manager.min-src-fraction=0.25;
set tez.shuffle-vertex-manager.max-src-fraction=0.75;
set hive.exec.dynamic.partition.mode=nonstrict;
set tez.am.resource.memory.mb=3200 ;
set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ;
SET hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=288435456;
performance hive mapjoin
add a comment |
I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records)
SELECT st.id
FROM small_table st
LEFT JOIN large_table lt
ON st.id = lt.id
In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory
Is there way to control the table that needs to be cached
Note: I cannot alter the table positions or the code:
Neither here...

...nor here...

Parameters used:
set hive.execution.engine=tez;
set hive.tez.container.size=4096;
set hive.merge.mapredfiles=true;
set tez.shuffle-vertex-manager.min-src-fraction=0.25;
set tez.shuffle-vertex-manager.max-src-fraction=0.75;
set hive.exec.dynamic.partition.mode=nonstrict;
set tez.am.resource.memory.mb=3200 ;
set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ;
SET hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=288435456;
performance hive mapjoin
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
Please add also parameters used
– leftjoin
Mar 23 at 8:03
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38
add a comment |
I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records)
SELECT st.id
FROM small_table st
LEFT JOIN large_table lt
ON st.id = lt.id
In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory
Is there way to control the table that needs to be cached
Note: I cannot alter the table positions or the code:
Neither here...

...nor here...

Parameters used:
set hive.execution.engine=tez;
set hive.tez.container.size=4096;
set hive.merge.mapredfiles=true;
set tez.shuffle-vertex-manager.min-src-fraction=0.25;
set tez.shuffle-vertex-manager.max-src-fraction=0.75;
set hive.exec.dynamic.partition.mode=nonstrict;
set tez.am.resource.memory.mb=3200 ;
set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ;
SET hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=288435456;
performance hive mapjoin
I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records)
SELECT st.id
FROM small_table st
LEFT JOIN large_table lt
ON st.id = lt.id
In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory
Is there way to control the table that needs to be cached
Note: I cannot alter the table positions or the code:
Neither here...

...nor here...

Parameters used:
set hive.execution.engine=tez;
set hive.tez.container.size=4096;
set hive.merge.mapredfiles=true;
set tez.shuffle-vertex-manager.min-src-fraction=0.25;
set tez.shuffle-vertex-manager.max-src-fraction=0.75;
set hive.exec.dynamic.partition.mode=nonstrict;
set tez.am.resource.memory.mb=3200 ;
set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ;
SET hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=288435456;
performance hive mapjoin
performance hive mapjoin
edited Mar 24 at 8:00
leftjoin
10.9k32356
10.9k32356
asked Mar 22 at 23:10
PK25PK25
62
62
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
Please add also parameters used
– leftjoin
Mar 23 at 8:03
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38
add a comment |
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
Please add also parameters used
– leftjoin
Mar 23 at 8:03
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
Please add also parameters used
– leftjoin
Mar 23 at 8:03
Please add also parameters used
– leftjoin
Mar 23 at 8:03
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55308917%2fhow-to-cache-the-left-most-table-in-memory-for-a-left-outer-join-in-hive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55308917%2fhow-to-cache-the-left-most-table-in-memory-for-a-left-outer-join-in-hive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id
– PK25
Mar 22 at 23:16
Please add also parameters used
– leftjoin
Mar 23 at 8:03
set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456;
– PK25
Mar 24 at 2:38