What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 secondsFactors that limit speed Presto?Partitioning method that can help to avoid having to specify the same information or column in Hive Partitioned Query?How to convert Java timestamp stored as bigint to timestamp in Presto?How to discard partition column from hive view while selecting?presto looking for partitions on localhost instead of remote host that has hive metastoreHow to handle hive locking across hive and prestoCan't read data in Presto - can in HiveEnforce hive sql-standard security for Presto according to the user submitting the queryHive View query is not using partitionHow do you add partitions to a partitioned table in Presto running in Amazon EMR?

Can you feel passing through the sound barrier in an F-16?

Give function defaults arguments from a dictionary in Python

Three Singles in Three Clubs

How do I make distance between concentric circles equal?

Avoiding racist tropes in fantasy

Why did MS-DOS applications built using Turbo Pascal fail to start with a division by zero error on faster systems?

Solve a logarithmic equation by NSolve

Why aren't RCS openings an issue for spacecraft heat shields?

Why is observed clock rate < 3MHz on Arduino Uno?

Were there 486SX revisions without an FPU on the die?

Why would the US President need briefings on UFOs?

What is this symbol: semicircles facing eachother

Church Booleans

Why is Boris Johnson visiting only Paris & Berlin if every member of the EU needs to agree on a withdrawal deal?

Was Switzerland really impossible to invade during WW2?

Defense against attacks using dictionaries

A list of proofs of "Coherent topoi have enough points"

How is "sein" conjugated in this sub-sentence?

In what ways can a Non-paladin access Paladin spells?

Factoring the square of this polynomial?

Is there a limit on how long the casting (speaking aloud part of the spell) of Wish can be?

How to persuade recruiters to send me the Job Description?

Is refusing to concede in the face of an unstoppable Nexus combo punishable?

Do ability scores have any effect on casting Wish spell

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds

Factors that limit speed Presto?Partitioning method that can help to avoid having to specify the same information or column in Hive Partitioned Query?How to convert Java timestamp stored as bigint to timestamp in Presto?How to discard partition column from hive view while selecting?presto looking for partitions on localhost instead of remote host that has hive metastoreHow to handle hive locking across hive and prestoCan't read data in Presto - can in HiveEnforce hive sql-standard security for Presto according to the user submitting the queryHive View query is not using partitionHow do you add partitions to a partitioned table in Presto running in Amazon EMR?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

My use case is i have two data sources:
1. Source1 (as speed layer)
2. Hive external table on top of S3(as batch layer)

I am using Presto for querying data from both the data sources by using view.
I want to create view that will union data from both the sources like : "create view test as select * from Source1.table union all select * from hive.table"

We are keeping 24 hours data in Source1 and after 24 hours that data will be migrated to s3 via hive.

Columns for Source1 tables are:timestamp,logtype,company,category

User will query data using timestamp range(can query data of last 15/30 minutes, last x hours, last x days, last x months, etc)
example: "select * from test where timestamp > (now() - interval '15' minute)","select * from test where timestamp > (now() - interval '12' hour)", "select * from test where timestamp > (now() - interval '1' day)"

To satisfy the user query I need to partition the hive table as well as the user should not be aware of the underlying stategy i.e if user is querying last x minutes data, he/she should not bother that if presto is reading the data from Source1 or hive.

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds?

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

add a comment |

My use case is i have two data sources:
1. Source1 (as speed layer)
2. Hive external table on top of S3(as batch layer)

We are keeping 24 hours data in Source1 and after 24 hours that data will be migrated to s3 via hive.

Columns for Source1 tables are:timestamp,logtype,company,category

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds?

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

add a comment |

My use case is i have two data sources:
1. Source1 (as speed layer)
2. Hive external table on top of S3(as batch layer)

We are keeping 24 hours data in Source1 and after 24 hours that data will be migrated to s3 via hive.

Columns for Source1 tables are:timestamp,logtype,company,category

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds?

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

My use case is i have two data sources:
1. Source1 (as speed layer)
2. Hive external table on top of S3(as batch layer)

We are keeping 24 hours data in Source1 and after 24 hours that data will be migrated to s3 via hive.

Columns for Source1 tables are:timestamp,logtype,company,category

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds?

hive bigdata data-warehouse presto data-partitioning

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

edited Apr 2 at 14:32

asked Mar 27 at 16:06

unknown_k

62 bronze badges

asked Mar 27 at 16:06

unknown_k

62 bronze badges

asked Mar 27 at 16:06

unknown_k

62 bronze badges

add a comment |

1 Answer
1

active

oldest

votes

For hive a partition column should be used which will queried in filter.

In your case this is timestamp. However if you use timestamp it would create a partition for every second (or millisecond) depending in the data in the column.

A better solution would be to create columns like year, month, day, hour (from timestamp) and to use these as partition columns.

The same strategy will work for Kudu however be advised it could create hot-spotting since all the newly arriving records will go to same (most-recent) partition this will limit insert (and may be query) performance.

To overcome use one additional column as hash partition along with timestamp derived columns.

e.g year, month, day, hour, logtype

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55381744%2fwhat-should-be-my-hive-partitioning-strategy-and-view-strategy-so-that-query-can%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

add a comment |

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

add a comment |

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

answered Mar 29 at 16:22

shanmuga

3,1381 gold badge10 silver badges28 bronze badges

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

add a comment |

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

Since you must use timestamp as the only filter criteria, you are limited to using timestamp as the partition. But creating partition for every timestamp (second) will create too many partitions in hive and will be bad for performance.

– shanmuga
Apr 1 at 8:59

add a comment |

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

W iq,cddxYkq6wyel,GqfYTu0nt2

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

1 Answer
1

1 Answer
1

1 Answer
1