compiler optimization: move variable from stack to registerWhy does the compiler not schedule or eliminate instructions optimally?Why does GCC generate 15-20% faster code if I optimize for size instead of speed?Is it intended by the C++ standards committee that in C++11 unordered_map destroys what it inserts?Why is f(i = -1, i = -1) undefined behavior?Swift Beta performance: sorting arraysReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy can compiler assume that the address of a global variable will fit 32bit?Why does GCC optimize out assignments here?Stricit aliasing violation: Why gcc and clang generate different output?__restrict vis-a-vis a function optimization behavior of popular compilers

Can a tourist shoot a gun for recreational purpose in the USA?

How to describe a building set which is like LEGO without using the "LEGO" word?

Do crew rest seats count towards the maximum allowed number of seats per flight attendant?

Will the volt, ampere, ohm or other electrical units change on May 20th, 2019?

Is there any way to adjust the damage type of the Eldritch Blast cantrip so that it does fire damage?

Filter a data-frame and add a new column according to the given condition

Why doesn't Iron Man's action affect this person in Endgame?

Was the dragon prowess intentionally downplayed in S08E04?

Are there any sonatas with only two sections?

Why are solar panels kept tilted?

How do I identify the partitions of my hard drive in order to then shred them all?

Should generated documentation be stored in a Git repository?

Is there any good reason to write "it is easy to see"?

Single word that parallels "Recent" when discussing the near future

What is this old US Air Force plane?

Why didn't the Avengers use this object earlier?

How to redirect stdout to a file, and stdout+stderr to another one?

the grammar about `adv adv` as 'too quickly'

Should I communicate in my applications that I'm unemployed out of choice rather than because nobody will have me?

White foam around tubeless tires

Is 95% of what you read in the financial press “either wrong or irrelevant?”

Use of さ as a filler

Why does lemon juice reduce the "fish" odor of sea food — specifically fish?

How to continually let my readers know what time it is in my story, in an organic way?



compiler optimization: move variable from stack to register


Why does the compiler not schedule or eliminate instructions optimally?Why does GCC generate 15-20% faster code if I optimize for size instead of speed?Is it intended by the C++ standards committee that in C++11 unordered_map destroys what it inserts?Why is f(i = -1, i = -1) undefined behavior?Swift Beta performance: sorting arraysReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy can compiler assume that the address of a global variable will fit 32bit?Why does GCC optimize out assignments here?Stricit aliasing violation: Why gcc and clang generate different output?__restrict vis-a-vis a function optimization behavior of popular compilers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








3















Here is the code:



#include <cstring>
#include <cstdint>
#include <cstddef>

uint64_t uint5korr(const std::byte *p)

uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;



https://godbolt.org/z/vULPAZ



clang here optimizes result to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.



Is this simply a missing optimization in gcc or maybe clang violates standard somehow?










share|improve this question



















  • 1





    It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

    – Eugene Kosov
    Mar 23 at 14:55






  • 1





    I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

    – Kenny Ostrom
    Mar 23 at 14:58







  • 1





    Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

    – Jesper Juhl
    Mar 23 at 15:02






  • 1





    it's better to click the clone compiler button so that we can compare the versions in the same window

    – phuclv
    Mar 23 at 15:07






  • 1





    Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

    – Michael Kenzel
    Mar 23 at 15:26

















3















Here is the code:



#include <cstring>
#include <cstdint>
#include <cstddef>

uint64_t uint5korr(const std::byte *p)

uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;



https://godbolt.org/z/vULPAZ



clang here optimizes result to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.



Is this simply a missing optimization in gcc or maybe clang violates standard somehow?










share|improve this question



















  • 1





    It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

    – Eugene Kosov
    Mar 23 at 14:55






  • 1





    I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

    – Kenny Ostrom
    Mar 23 at 14:58







  • 1





    Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

    – Jesper Juhl
    Mar 23 at 15:02






  • 1





    it's better to click the clone compiler button so that we can compare the versions in the same window

    – phuclv
    Mar 23 at 15:07






  • 1





    Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

    – Michael Kenzel
    Mar 23 at 15:26













3












3








3








Here is the code:



#include <cstring>
#include <cstdint>
#include <cstddef>

uint64_t uint5korr(const std::byte *p)

uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;



https://godbolt.org/z/vULPAZ



clang here optimizes result to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.



Is this simply a missing optimization in gcc or maybe clang violates standard somehow?










share|improve this question
















Here is the code:



#include <cstring>
#include <cstdint>
#include <cstddef>

uint64_t uint5korr(const std::byte *p)

uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;



https://godbolt.org/z/vULPAZ



clang here optimizes result to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.



Is this simply a missing optimization in gcc or maybe clang violates standard somehow?







c++ gcc clang language-lawyer compiler-optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 15:12







Eugene Kosov

















asked Mar 23 at 14:42









Eugene KosovEugene Kosov

541412




541412







  • 1





    It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

    – Eugene Kosov
    Mar 23 at 14:55






  • 1





    I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

    – Kenny Ostrom
    Mar 23 at 14:58







  • 1





    Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

    – Jesper Juhl
    Mar 23 at 15:02






  • 1





    it's better to click the clone compiler button so that we can compare the versions in the same window

    – phuclv
    Mar 23 at 15:07






  • 1





    Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

    – Michael Kenzel
    Mar 23 at 15:26












  • 1





    It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

    – Eugene Kosov
    Mar 23 at 14:55






  • 1





    I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

    – Kenny Ostrom
    Mar 23 at 14:58







  • 1





    Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

    – Jesper Juhl
    Mar 23 at 15:02






  • 1





    it's better to click the clone compiler button so that we can compare the versions in the same window

    – phuclv
    Mar 23 at 15:07






  • 1





    Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

    – Michael Kenzel
    Mar 23 at 15:26







1




1





It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

– Eugene Kosov
Mar 23 at 14:55





It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.

– Eugene Kosov
Mar 23 at 14:55




1




1





I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

– Kenny Ostrom
Mar 23 at 14:58






I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.

– Kenny Ostrom
Mar 23 at 14:58





1




1





Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

– Jesper Juhl
Mar 23 at 15:02





Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?

– Jesper Juhl
Mar 23 at 15:02




1




1





it's better to click the clone compiler button so that we can compare the versions in the same window

– phuclv
Mar 23 at 15:07





it's better to click the clone compiler button so that we can compare the versions in the same window

– phuclv
Mar 23 at 15:07




1




1





Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

– Michael Kenzel
Mar 23 at 15:26





Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior. std::uint64_t is a trivially-copyable type, but you're not copying the whole object representation here…

– Michael Kenzel
Mar 23 at 15:26












2 Answers
2






active

oldest

votes


















1














Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.






share|improve this answer






























    1














    Not a language-lawyer answer.



    While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpyed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.



    A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:



    std::uint64_t load_u40(const std::byte *p)

    std::uint8_t lo = 0;
    std::memcpy(&lo, p, 1);
    std::uint32_t hi = 0;
    std::memcpy(&hi, p + 1, 4);
    return (static_cast<std::uint64_t>(hi) << 8)


    https://godbolt.org/z/4Kk9IM






    share|improve this answer























    • How is this a GCC bug?

      – aschepler
      Mar 24 at 18:49











    • Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

      – Eugene Kosov
      Mar 24 at 18:51











    • @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

      – Nikita Kniazev
      Mar 24 at 21:07











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55314885%2fcompiler-optimization-move-variable-from-stack-to-register%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.






    share|improve this answer



























      1














      Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.






      share|improve this answer

























        1












        1








        1







        Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.






        share|improve this answer













        Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 23 at 18:57









        Davis HerringDavis Herring

        10k1736




        10k1736























            1














            Not a language-lawyer answer.



            While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpyed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.



            A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:



            std::uint64_t load_u40(const std::byte *p)

            std::uint8_t lo = 0;
            std::memcpy(&lo, p, 1);
            std::uint32_t hi = 0;
            std::memcpy(&hi, p + 1, 4);
            return (static_cast<std::uint64_t>(hi) << 8)


            https://godbolt.org/z/4Kk9IM






            share|improve this answer























            • How is this a GCC bug?

              – aschepler
              Mar 24 at 18:49











            • Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

              – Eugene Kosov
              Mar 24 at 18:51











            • @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

              – Nikita Kniazev
              Mar 24 at 21:07















            1














            Not a language-lawyer answer.



            While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpyed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.



            A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:



            std::uint64_t load_u40(const std::byte *p)

            std::uint8_t lo = 0;
            std::memcpy(&lo, p, 1);
            std::uint32_t hi = 0;
            std::memcpy(&hi, p + 1, 4);
            return (static_cast<std::uint64_t>(hi) << 8)


            https://godbolt.org/z/4Kk9IM






            share|improve this answer























            • How is this a GCC bug?

              – aschepler
              Mar 24 at 18:49











            • Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

              – Eugene Kosov
              Mar 24 at 18:51











            • @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

              – Nikita Kniazev
              Mar 24 at 21:07













            1












            1








            1







            Not a language-lawyer answer.



            While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpyed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.



            A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:



            std::uint64_t load_u40(const std::byte *p)

            std::uint8_t lo = 0;
            std::memcpy(&lo, p, 1);
            std::uint32_t hi = 0;
            std::memcpy(&hi, p + 1, 4);
            return (static_cast<std::uint64_t>(hi) << 8)


            https://godbolt.org/z/4Kk9IM






            share|improve this answer













            Not a language-lawyer answer.



            While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpyed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.



            A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:



            std::uint64_t load_u40(const std::byte *p)

            std::uint8_t lo = 0;
            std::memcpy(&lo, p, 1);
            std::uint32_t hi = 0;
            std::memcpy(&hi, p + 1, 4);
            return (static_cast<std::uint64_t>(hi) << 8)


            https://godbolt.org/z/4Kk9IM







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 24 at 18:17









            Nikita KniazevNikita Kniazev

            2,49521023




            2,49521023












            • How is this a GCC bug?

              – aschepler
              Mar 24 at 18:49











            • Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

              – Eugene Kosov
              Mar 24 at 18:51











            • @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

              – Nikita Kniazev
              Mar 24 at 21:07

















            • How is this a GCC bug?

              – aschepler
              Mar 24 at 18:49











            • Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

              – Eugene Kosov
              Mar 24 at 18:51











            • @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

              – Nikita Kniazev
              Mar 24 at 21:07
















            How is this a GCC bug?

            – aschepler
            Mar 24 at 18:49





            How is this a GCC bug?

            – aschepler
            Mar 24 at 18:49













            Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

            – Eugene Kosov
            Mar 24 at 18:51





            Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

            – Eugene Kosov
            Mar 24 at 18:51













            @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

            – Nikita Kniazev
            Mar 24 at 21:07





            @aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization

            – Nikita Kniazev
            Mar 24 at 21:07

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55314885%2fcompiler-optimization-move-variable-from-stack-to-register%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript