Adding Extra HASH partitions to already HASH partitioned tableHow to find all the tables in MySQL with specific column names in them?Duplicating a MySQL table, indices, and dataInsert into a MySQL table or update if existsSQL Server -is a GUID based PK the best practice to support tenant based horizontal partitioningHow to get the sizes of the tables of a MySQL database?Phinx and pt-online-schema-changemysql repartitioned table much largermysql alter on large table to another location?Add partition in MySQL Table to an Already Partitioned TableHow to alter and update large table to add composite key columns form another table
How often can a PC check with passive perception during a combat turn?
How to determine what is the correct level of detail when modelling?
What are the penalties for overstaying in USA?
Are Finite Automata Turing Complete?
What's the difference between 予定 (Yotei) and 計画 (keikaku)?
Why aren't (poly-)cotton tents more popular?
How many codes are possible?
Inverse-quotes-quine
What is the line crossing the Pacific Ocean that is shown on maps?
Is there a maximum distance from a planet that a moon can orbit?
Why do some games show lights shine through walls?
How should I behave to assure my friends that I am not after their money?
Could Sauron have read Tom Bombadil's mind if Tom had held the Palantir?
Why is the Turkish president's surname spelt in Russian as Эрдоган, with г?
What happens when your group is victim of a surprise attack but you can't be surprised?
How to append a matrix element by element?
Short story with brother-sister conjoined twins as protagonist?
Averting Real Women Don’t Wear Dresses
How to perform Login Authentication at the client-side?
Why is C++ initial allocation so much larger than C's?
How risky is real estate?
What is this particular type of chord progression, common in classical music, called?
Fedora boot screen shows both Fedora logo and Lenovo logo. Why and How?
Does the UK have a written constitution?
Adding Extra HASH partitions to already HASH partitioned table
How to find all the tables in MySQL with specific column names in them?Duplicating a MySQL table, indices, and dataInsert into a MySQL table or update if existsSQL Server -is a GUID based PK the best practice to support tenant based horizontal partitioningHow to get the sizes of the tables of a MySQL database?Phinx and pt-online-schema-changemysql repartitioned table much largermysql alter on large table to another location?Add partition in MySQL Table to an Already Partitioned TableHow to alter and update large table to add composite key columns form another table
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.
Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?
The way I partitioned was using the below code.
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;
Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html
mysql partitioning
add a comment |
Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.
Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?
The way I partitioned was using the below code.
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;
Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html
mysql partitioning
I won't be surprised if you gain nothing by the change. Possibly, yourSELECTswill slow down. Keep us posted. And provideSHOW CREATE TABLEand the mainSELECT.
– Rick James
Apr 17 at 3:21
add a comment |
Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.
Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?
The way I partitioned was using the below code.
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;
Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html
mysql partitioning
Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.
Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?
The way I partitioned was using the below code.
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;
Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html
mysql partitioning
mysql partitioning
asked Mar 25 at 11:01
LukeraynerLukerayner
439 bronze badges
439 bronze badges
I won't be surprised if you gain nothing by the change. Possibly, yourSELECTswill slow down. Keep us posted. And provideSHOW CREATE TABLEand the mainSELECT.
– Rick James
Apr 17 at 3:21
add a comment |
I won't be surprised if you gain nothing by the change. Possibly, yourSELECTswill slow down. Keep us posted. And provideSHOW CREATE TABLEand the mainSELECT.
– Rick James
Apr 17 at 3:21
I won't be surprised if you gain nothing by the change. Possibly, your
SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.– Rick James
Apr 17 at 3:21
I won't be surprised if you gain nothing by the change. Possibly, your
SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.– Rick James
Apr 17 at 3:21
add a comment |
2 Answers
2
active
oldest
votes
You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.
Just ALTER TABLE and define the new partitioning scheme:
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;
Or with pt-online-schema-change:
pt-online-schema-change h=myhost,D=mydatabase,t=t1
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute
(I put line breaks in there to avoid line-wrapping, but that's one command.)
I forgot to comment on your other question, about predicting the ETA for completion.
One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.
Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition usingPARTITION(p46)for example. I am hoping because I do this the total number of partitions shouldn't have an impact.
– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
|
show 1 more comment
PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.
You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.
Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.
You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.
Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).
Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)
So...
PARTITION BY RANGE(day_epoch)
PRIMARY KEY(venue_id, zone_id, id) -- in this order
Without partitioning, I recommend
PRIMARY KEY(venue_id, zone_id, day_epoch, id)
In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.
For the sake of the uniqueness requirement of the PK, I put the id last.
So, the query performs something like this:
- "Partition pruning" -- probably down to a single partition, based on the date.
- Drill down the PK directly to the consecutive rows for the one
venue_idin question. - Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
- (If it makes it this far) Then get the desired date.
When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336322%2fadding-extra-hash-partitions-to-already-hash-partitioned-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.
Just ALTER TABLE and define the new partitioning scheme:
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;
Or with pt-online-schema-change:
pt-online-schema-change h=myhost,D=mydatabase,t=t1
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute
(I put line breaks in there to avoid line-wrapping, but that's one command.)
I forgot to comment on your other question, about predicting the ETA for completion.
One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.
Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition usingPARTITION(p46)for example. I am hoping because I do this the total number of partitions shouldn't have an impact.
– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
|
show 1 more comment
You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.
Just ALTER TABLE and define the new partitioning scheme:
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;
Or with pt-online-schema-change:
pt-online-schema-change h=myhost,D=mydatabase,t=t1
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute
(I put line breaks in there to avoid line-wrapping, but that's one command.)
I forgot to comment on your other question, about predicting the ETA for completion.
One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.
Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition usingPARTITION(p46)for example. I am hoping because I do this the total number of partitions shouldn't have an impact.
– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
|
show 1 more comment
You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.
Just ALTER TABLE and define the new partitioning scheme:
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;
Or with pt-online-schema-change:
pt-online-schema-change h=myhost,D=mydatabase,t=t1
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute
(I put line breaks in there to avoid line-wrapping, but that's one command.)
I forgot to comment on your other question, about predicting the ETA for completion.
One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.
Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).
You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.
Just ALTER TABLE and define the new partitioning scheme:
ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;
Or with pt-online-schema-change:
pt-online-schema-change h=myhost,D=mydatabase,t=t1
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute
(I put line breaks in there to avoid line-wrapping, but that's one command.)
I forgot to comment on your other question, about predicting the ETA for completion.
One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.
Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).
edited Mar 25 at 14:03
answered Mar 25 at 13:16
Bill KarwinBill Karwin
393k67 gold badges531 silver badges685 bronze badges
393k67 gold badges531 silver badges685 bronze badges
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition usingPARTITION(p46)for example. I am hoping because I do this the total number of partitions shouldn't have an impact.
– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
|
show 1 more comment
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition usingPARTITION(p46)for example. I am hoping because I do this the total number of partitions shouldn't have an impact.
– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)
– Lukerayner
Mar 25 at 13:57
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.
– Lukerayner
Mar 25 at 14:30
1
1
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.
– Bill Karwin
Mar 25 at 14:34
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using
PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.– Lukerayner
Mar 25 at 14:37
That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using
PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.– Lukerayner
Mar 25 at 14:37
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.
– Bill Karwin
Mar 25 at 14:39
|
show 1 more comment
PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.
You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.
Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.
You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.
Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).
Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)
So...
PARTITION BY RANGE(day_epoch)
PRIMARY KEY(venue_id, zone_id, id) -- in this order
Without partitioning, I recommend
PRIMARY KEY(venue_id, zone_id, day_epoch, id)
In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.
For the sake of the uniqueness requirement of the PK, I put the id last.
So, the query performs something like this:
- "Partition pruning" -- probably down to a single partition, based on the date.
- Drill down the PK directly to the consecutive rows for the one
venue_idin question. - Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
- (If it makes it this far) Then get the desired date.
When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.
add a comment |
PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.
You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.
Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.
You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.
Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).
Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)
So...
PARTITION BY RANGE(day_epoch)
PRIMARY KEY(venue_id, zone_id, id) -- in this order
Without partitioning, I recommend
PRIMARY KEY(venue_id, zone_id, day_epoch, id)
In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.
For the sake of the uniqueness requirement of the PK, I put the id last.
So, the query performs something like this:
- "Partition pruning" -- probably down to a single partition, based on the date.
- Drill down the PK directly to the consecutive rows for the one
venue_idin question. - Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
- (If it makes it this far) Then get the desired date.
When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.
add a comment |
PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.
You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.
Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.
You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.
Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).
Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)
So...
PARTITION BY RANGE(day_epoch)
PRIMARY KEY(venue_id, zone_id, id) -- in this order
Without partitioning, I recommend
PRIMARY KEY(venue_id, zone_id, day_epoch, id)
In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.
For the sake of the uniqueness requirement of the PK, I put the id last.
So, the query performs something like this:
- "Partition pruning" -- probably down to a single partition, based on the date.
- Drill down the PK directly to the consecutive rows for the one
venue_idin question. - Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
- (If it makes it this far) Then get the desired date.
When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.
PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.
You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.
Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.
You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.
Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).
Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)
So...
PARTITION BY RANGE(day_epoch)
PRIMARY KEY(venue_id, zone_id, id) -- in this order
Without partitioning, I recommend
PRIMARY KEY(venue_id, zone_id, day_epoch, id)
In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.
For the sake of the uniqueness requirement of the PK, I put the id last.
So, the query performs something like this:
- "Partition pruning" -- probably down to a single partition, based on the date.
- Drill down the PK directly to the consecutive rows for the one
venue_idin question. - Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
- (If it makes it this far) Then get the desired date.
When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.
answered Apr 17 at 15:30
Rick JamesRick James
75k5 gold badges68 silver badges110 bronze badges
75k5 gold badges68 silver badges110 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336322%2fadding-extra-hash-partitions-to-already-hash-partitioned-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I won't be surprised if you gain nothing by the change. Possibly, your
SELECTswill slow down. Keep us posted. And provideSHOW CREATE TABLEand the mainSELECT.– Rick James
Apr 17 at 3:21