Adding Extra HASH partitions to already HASH partitioned tableHow to find all the tables in MySQL with specific column names in them?Duplicating a MySQL table, indices, and dataInsert into a MySQL table or update if existsSQL Server -is a GUID based PK the best practice to support tenant based horizontal partitioningHow to get the sizes of the tables of a MySQL database?Phinx and pt-online-schema-changemysql repartitioned table much largermysql alter on large table to another location?Add partition in MySQL Table to an Already Partitioned TableHow to alter and update large table to add composite key columns form another table

How often can a PC check with passive perception during a combat turn?

How to determine what is the correct level of detail when modelling?

What are the penalties for overstaying in USA?

Are Finite Automata Turing Complete?

What's the difference between 予定 (Yotei) and 計画 (keikaku)?

Why aren't (poly-)cotton tents more popular?

How many codes are possible?

Inverse-quotes-quine

What is the line crossing the Pacific Ocean that is shown on maps?

Is there a maximum distance from a planet that a moon can orbit?

Why do some games show lights shine through walls?

How should I behave to assure my friends that I am not after their money?

Could Sauron have read Tom Bombadil's mind if Tom had held the Palantir?

Why is the Turkish president's surname spelt in Russian as Эрдоган, with г?

What happens when your group is victim of a surprise attack but you can't be surprised?

How to append a matrix element by element?

Short story with brother-sister conjoined twins as protagonist?

Averting Real Women Don’t Wear Dresses

How to perform Login Authentication at the client-side?

Why is C++ initial allocation so much larger than C's?

How risky is real estate?

What is this particular type of chord progression, common in classical music, called?

Fedora boot screen shows both Fedora logo and Lenovo logo. Why and How?

Does the UK have a written constitution?



Adding Extra HASH partitions to already HASH partitioned table


How to find all the tables in MySQL with specific column names in them?Duplicating a MySQL table, indices, and dataInsert into a MySQL table or update if existsSQL Server -is a GUID based PK the best practice to support tenant based horizontal partitioningHow to get the sizes of the tables of a MySQL database?Phinx and pt-online-schema-changemysql repartitioned table much largermysql alter on large table to another location?Add partition in MySQL Table to an Already Partitioned TableHow to alter and update large table to add composite key columns form another table






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.



Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?



The way I partitioned was using the below code.



ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;


Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html










share|improve this question






















  • I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

    – Rick James
    Apr 17 at 3:21

















1















Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.



Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?



The way I partitioned was using the below code.



ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;


Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html










share|improve this question






















  • I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

    – Rick James
    Apr 17 at 3:21













1












1








1








Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.



Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?



The way I partitioned was using the below code.



ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;


Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html










share|improve this question














Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.



Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?



The way I partitioned was using the below code.



ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;


Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html







mysql partitioning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 25 at 11:01









LukeraynerLukerayner

439 bronze badges




439 bronze badges












  • I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

    – Rick James
    Apr 17 at 3:21

















  • I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

    – Rick James
    Apr 17 at 3:21
















I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

– Rick James
Apr 17 at 3:21





I won't be surprised if you gain nothing by the change. Possibly, your SELECTs will slow down. Keep us posted. And provide SHOW CREATE TABLE and the main SELECT.

– Rick James
Apr 17 at 3:21












2 Answers
2






active

oldest

votes


















1














You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.



Just ALTER TABLE and define the new partitioning scheme:



ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;


Or with pt-online-schema-change:



pt-online-schema-change h=myhost,D=mydatabase,t=t1 
--alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
--execute


(I put line breaks in there to avoid line-wrapping, but that's one command.)




I forgot to comment on your other question, about predicting the ETA for completion.



One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.



Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).






share|improve this answer

























  • Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

    – Lukerayner
    Mar 25 at 13:57











  • That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

    – Lukerayner
    Mar 25 at 14:30







  • 1





    I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

    – Bill Karwin
    Mar 25 at 14:34











  • That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

    – Lukerayner
    Mar 25 at 14:37











  • Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

    – Bill Karwin
    Mar 25 at 14:39


















0














PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.



You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.



Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.



You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.



Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).



Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)



So...



PARTITION BY RANGE(day_epoch)

PRIMARY KEY(venue_id, zone_id, id) -- in this order


Without partitioning, I recommend



PRIMARY KEY(venue_id, zone_id, day_epoch, id)


In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.



For the sake of the uniqueness requirement of the PK, I put the id last.



So, the query performs something like this:



  1. "Partition pruning" -- probably down to a single partition, based on the date.

  2. Drill down the PK directly to the consecutive rows for the one venue_id in question.

  3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.

  4. (If it makes it this far) Then get the desired date.

When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336322%2fadding-extra-hash-partitions-to-already-hash-partitioned-table%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.



    Just ALTER TABLE and define the new partitioning scheme:



    ALTER TABLE t1
    PARTITION BY HASH(venue_id)
    PARTITIONS 1000;


    Or with pt-online-schema-change:



    pt-online-schema-change h=myhost,D=mydatabase,t=t1 
    --alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
    --execute


    (I put line breaks in there to avoid line-wrapping, but that's one command.)




    I forgot to comment on your other question, about predicting the ETA for completion.



    One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.



    Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).






    share|improve this answer

























    • Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

      – Lukerayner
      Mar 25 at 13:57











    • That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

      – Lukerayner
      Mar 25 at 14:30







    • 1





      I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

      – Bill Karwin
      Mar 25 at 14:34











    • That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

      – Lukerayner
      Mar 25 at 14:37











    • Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

      – Bill Karwin
      Mar 25 at 14:39















    1














    You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.



    Just ALTER TABLE and define the new partitioning scheme:



    ALTER TABLE t1
    PARTITION BY HASH(venue_id)
    PARTITIONS 1000;


    Or with pt-online-schema-change:



    pt-online-schema-change h=myhost,D=mydatabase,t=t1 
    --alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
    --execute


    (I put line breaks in there to avoid line-wrapping, but that's one command.)




    I forgot to comment on your other question, about predicting the ETA for completion.



    One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.



    Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).






    share|improve this answer

























    • Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

      – Lukerayner
      Mar 25 at 13:57











    • That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

      – Lukerayner
      Mar 25 at 14:30







    • 1





      I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

      – Bill Karwin
      Mar 25 at 14:34











    • That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

      – Lukerayner
      Mar 25 at 14:37











    • Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

      – Bill Karwin
      Mar 25 at 14:39













    1












    1








    1







    You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.



    Just ALTER TABLE and define the new partitioning scheme:



    ALTER TABLE t1
    PARTITION BY HASH(venue_id)
    PARTITIONS 1000;


    Or with pt-online-schema-change:



    pt-online-schema-change h=myhost,D=mydatabase,t=t1 
    --alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
    --execute


    (I put line breaks in there to avoid line-wrapping, but that's one command.)




    I forgot to comment on your other question, about predicting the ETA for completion.



    One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.



    Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).






    share|improve this answer















    You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.



    Just ALTER TABLE and define the new partitioning scheme:



    ALTER TABLE t1
    PARTITION BY HASH(venue_id)
    PARTITIONS 1000;


    Or with pt-online-schema-change:



    pt-online-schema-change h=myhost,D=mydatabase,t=t1 
    --alter "PARTITION BY HASH(venue_id) PARTITIONS 1000"
    --execute


    (I put line breaks in there to avoid line-wrapping, but that's one command.)




    I forgot to comment on your other question, about predicting the ETA for completion.



    One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.



    Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 25 at 14:03

























    answered Mar 25 at 13:16









    Bill KarwinBill Karwin

    393k67 gold badges531 silver badges685 bronze badges




    393k67 gold badges531 silver badges685 bronze badges












    • Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

      – Lukerayner
      Mar 25 at 13:57











    • That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

      – Lukerayner
      Mar 25 at 14:30







    • 1





      I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

      – Bill Karwin
      Mar 25 at 14:34











    • That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

      – Lukerayner
      Mar 25 at 14:37











    • Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

      – Bill Karwin
      Mar 25 at 14:39

















    • Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

      – Lukerayner
      Mar 25 at 13:57











    • That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

      – Lukerayner
      Mar 25 at 14:30







    • 1





      I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

      – Bill Karwin
      Mar 25 at 14:34











    • That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

      – Lukerayner
      Mar 25 at 14:37











    • Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

      – Bill Karwin
      Mar 25 at 14:39
















    Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

    – Lukerayner
    Mar 25 at 13:57





    Thank you very much for this answer. I am currently running the percona command and once it has finished and worked I will mark this as the correct answer. :)

    – Lukerayner
    Mar 25 at 13:57













    That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

    – Lukerayner
    Mar 25 at 14:30






    That worked perfectly and thank you for the extra info in your update. I just thought it would be worth asking if once I have added the 1000 partitions should the performance be the same or a bit slower? I don't need 1000 just yet but in a year or 2 I will so I just thought it was best to do it now before I had loads of data making the alter take hours/days.

    – Lukerayner
    Mar 25 at 14:30





    1




    1





    I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

    – Bill Karwin
    Mar 25 at 14:34





    I frequently say that tables don't have performance — queries have performance. There are certainly queries that will not perform well no matter how many partitions you have, and the greater number of partitions may cause them to be slower.

    – Bill Karwin
    Mar 25 at 14:34













    That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

    – Lukerayner
    Mar 25 at 14:37





    That is a very true comment ;) Ok thank you for the advice, I am forcing the query to look at a certain partition using PARTITION(p46) for example. I am hoping because I do this the total number of partitions shouldn't have an impact.

    – Lukerayner
    Mar 25 at 14:37













    Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

    – Bill Karwin
    Mar 25 at 14:39





    Yes, if you limit the partitions explicitly, or else if the optimizer does that for you by partition pruning, then dividing your table into smaller partitions should help it scan fewer rows, which will reduce the overall query time.

    – Bill Karwin
    Mar 25 at 14:39













    0














    PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.



    You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.



    Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.



    You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.



    Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).



    Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)



    So...



    PARTITION BY RANGE(day_epoch)

    PRIMARY KEY(venue_id, zone_id, id) -- in this order


    Without partitioning, I recommend



    PRIMARY KEY(venue_id, zone_id, day_epoch, id)


    In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.



    For the sake of the uniqueness requirement of the PK, I put the id last.



    So, the query performs something like this:



    1. "Partition pruning" -- probably down to a single partition, based on the date.

    2. Drill down the PK directly to the consecutive rows for the one venue_id in question.

    3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.

    4. (If it makes it this far) Then get the desired date.

    When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.






    share|improve this answer



























      0














      PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.



      You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.



      Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.



      You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.



      Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).



      Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)



      So...



      PARTITION BY RANGE(day_epoch)

      PRIMARY KEY(venue_id, zone_id, id) -- in this order


      Without partitioning, I recommend



      PRIMARY KEY(venue_id, zone_id, day_epoch, id)


      In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.



      For the sake of the uniqueness requirement of the PK, I put the id last.



      So, the query performs something like this:



      1. "Partition pruning" -- probably down to a single partition, based on the date.

      2. Drill down the PK directly to the consecutive rows for the one venue_id in question.

      3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.

      4. (If it makes it this far) Then get the desired date.

      When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.






      share|improve this answer

























        0












        0








        0







        PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.



        You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.



        Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.



        You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.



        Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).



        Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)



        So...



        PARTITION BY RANGE(day_epoch)

        PRIMARY KEY(venue_id, zone_id, id) -- in this order


        Without partitioning, I recommend



        PRIMARY KEY(venue_id, zone_id, day_epoch, id)


        In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.



        For the sake of the uniqueness requirement of the PK, I put the id last.



        So, the query performs something like this:



        1. "Partition pruning" -- probably down to a single partition, based on the date.

        2. Drill down the PK directly to the consecutive rows for the one venue_id in question.

        3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.

        4. (If it makes it this far) Then get the desired date.

        When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.






        share|improve this answer













        PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.



        You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.



        Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.



        You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.



        Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).



        Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)



        So...



        PARTITION BY RANGE(day_epoch)

        PRIMARY KEY(venue_id, zone_id, id) -- in this order


        Without partitioning, I recommend



        PRIMARY KEY(venue_id, zone_id, day_epoch, id)


        In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.



        For the sake of the uniqueness requirement of the PK, I put the id last.



        So, the query performs something like this:



        1. "Partition pruning" -- probably down to a single partition, based on the date.

        2. Drill down the PK directly to the consecutive rows for the one venue_id in question.

        3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.

        4. (If it makes it this far) Then get the desired date.

        When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 17 at 15:30









        Rick JamesRick James

        75k5 gold badges68 silver badges110 bronze badges




        75k5 gold badges68 silver badges110 bronze badges



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336322%2fadding-extra-hash-partitions-to-already-hash-partitioned-table%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

            155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해