compiler optimization: move variable from stack to registerWhy does the compiler not schedule or eliminate instructions optimally?Why does GCC generate 15-20% faster code if I optimize for size instead of speed?Is it intended by the C++ standards committee that in C++11 unordered_map destroys what it inserts?Why is f(i = -1, i = -1) undefined behavior?Swift Beta performance: sorting arraysReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy can compiler assume that the address of a global variable will fit 32bit?Why does GCC optimize out assignments here?Stricit aliasing violation: Why gcc and clang generate different output?__restrict vis-a-vis a function optimization behavior of popular compilers
Can a tourist shoot a gun for recreational purpose in the USA?
How to describe a building set which is like LEGO without using the "LEGO" word?
Do crew rest seats count towards the maximum allowed number of seats per flight attendant?
Will the volt, ampere, ohm or other electrical units change on May 20th, 2019?
Is there any way to adjust the damage type of the Eldritch Blast cantrip so that it does fire damage?
Filter a data-frame and add a new column according to the given condition
Why doesn't Iron Man's action affect this person in Endgame?
Was the dragon prowess intentionally downplayed in S08E04?
Are there any sonatas with only two sections?
Why are solar panels kept tilted?
How do I identify the partitions of my hard drive in order to then shred them all?
Should generated documentation be stored in a Git repository?
Is there any good reason to write "it is easy to see"?
Single word that parallels "Recent" when discussing the near future
What is this old US Air Force plane?
Why didn't the Avengers use this object earlier?
How to redirect stdout to a file, and stdout+stderr to another one?
the grammar about `adv adv` as 'too quickly'
Should I communicate in my applications that I'm unemployed out of choice rather than because nobody will have me?
White foam around tubeless tires
Is 95% of what you read in the financial press “either wrong or irrelevant?”
Use of さ as a filler
Why does lemon juice reduce the "fish" odor of sea food — specifically fish?
How to continually let my readers know what time it is in my story, in an organic way?
compiler optimization: move variable from stack to register
Why does the compiler not schedule or eliminate instructions optimally?Why does GCC generate 15-20% faster code if I optimize for size instead of speed?Is it intended by the C++ standards committee that in C++11 unordered_map destroys what it inserts?Why is f(i = -1, i = -1) undefined behavior?Swift Beta performance: sorting arraysReplacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy can compiler assume that the address of a global variable will fit 32bit?Why does GCC optimize out assignments here?Stricit aliasing violation: Why gcc and clang generate different output?__restrict vis-a-vis a function optimization behavior of popular compilers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Here is the code:
#include <cstring>
#include <cstdint>
#include <cstddef>
uint64_t uint5korr(const std::byte *p)
uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;
https://godbolt.org/z/vULPAZ
clang here optimizes result
to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.
Is this simply a missing optimization in gcc or maybe clang violates standard somehow?
c++ gcc clang language-lawyer compiler-optimization
|
show 10 more comments
Here is the code:
#include <cstring>
#include <cstdint>
#include <cstddef>
uint64_t uint5korr(const std::byte *p)
uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;
https://godbolt.org/z/vULPAZ
clang here optimizes result
to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.
Is this simply a missing optimization in gcc or maybe clang violates standard somehow?
c++ gcc clang language-lawyer compiler-optimization
1
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
1
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
1
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
1
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
1
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…
– Michael Kenzel
Mar 23 at 15:26
|
show 10 more comments
Here is the code:
#include <cstring>
#include <cstdint>
#include <cstddef>
uint64_t uint5korr(const std::byte *p)
uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;
https://godbolt.org/z/vULPAZ
clang here optimizes result
to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.
Is this simply a missing optimization in gcc or maybe clang violates standard somehow?
c++ gcc clang language-lawyer compiler-optimization
Here is the code:
#include <cstring>
#include <cstdint>
#include <cstddef>
uint64_t uint5korr(const std::byte *p)
uint64_t result= 0;
std::memcpy(&result, p, 5);
return result;
https://godbolt.org/z/vULPAZ
clang here optimizes result
to a register while gcc doesn't.
I suspect this can be somehow related to the fact I'm taking an address of a variable as one can not take the address of a register.
Is this simply a missing optimization in gcc or maybe clang violates standard somehow?
c++ gcc clang language-lawyer compiler-optimization
c++ gcc clang language-lawyer compiler-optimization
edited Mar 23 at 15:12
Eugene Kosov
asked Mar 23 at 14:42
Eugene KosovEugene Kosov
541412
541412
1
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
1
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
1
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
1
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
1
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…
– Michael Kenzel
Mar 23 at 15:26
|
show 10 more comments
1
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
1
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
1
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
1
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
1
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…
– Michael Kenzel
Mar 23 at 15:26
1
1
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
1
1
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
1
1
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
1
1
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
1
1
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.
std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…– Michael Kenzel
Mar 23 at 15:26
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.
std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…– Michael Kenzel
Mar 23 at 15:26
|
show 10 more comments
2 Answers
2
active
oldest
votes
Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return
, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.
add a comment |
Not a language-lawyer answer.
While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpy
ed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.
A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:
std::uint64_t load_u40(const std::byte *p)
std::uint8_t lo = 0;
std::memcpy(&lo, p, 1);
std::uint32_t hi = 0;
std::memcpy(&hi, p + 1, 4);
return (static_cast<std::uint64_t>(hi) << 8)
https://godbolt.org/z/4Kk9IM
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55314885%2fcompiler-optimization-move-variable-from-stack-to-register%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return
, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.
add a comment |
Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return
, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.
add a comment |
Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return
, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.
Yes, this optimization is legitimate. 5 bytes (not 8) are read from the correct address; there’s no need to store them again just to read them for the return
, address taken or no. I share Michael Kenzel’s skepticism that this has defined behavior, but that can only cement the validity of the optimization.
answered Mar 23 at 18:57
Davis HerringDavis Herring
10k1736
10k1736
add a comment |
add a comment |
Not a language-lawyer answer.
While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpy
ed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.
A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:
std::uint64_t load_u40(const std::byte *p)
std::uint8_t lo = 0;
std::memcpy(&lo, p, 1);
std::uint32_t hi = 0;
std::memcpy(&hi, p + 1, 4);
return (static_cast<std::uint64_t>(hi) << 8)
https://godbolt.org/z/4Kk9IM
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
add a comment |
Not a language-lawyer answer.
While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpy
ed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.
A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:
std::uint64_t load_u40(const std::byte *p)
std::uint8_t lo = 0;
std::memcpy(&lo, p, 1);
std::uint32_t hi = 0;
std::memcpy(&hi, p + 1, 4);
return (static_cast<std::uint64_t>(hi) << 8)
https://godbolt.org/z/4Kk9IM
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
add a comment |
Not a language-lawyer answer.
While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpy
ed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.
A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:
std::uint64_t load_u40(const std::byte *p)
std::uint8_t lo = 0;
std::memcpy(&lo, p, 1);
std::uint32_t hi = 0;
std::memcpy(&hi, p + 1, 4);
return (static_cast<std::uint64_t>(hi) << 8)
https://godbolt.org/z/4Kk9IM
Not a language-lawyer answer.
While the optimization itself is really seems to be missing in GCC, but the usage of a partial memcpy
ed value IIUC is an undefined behaviour. I would file a bug to GCC to get a clear response on the subject.
A perfectly optimized by GCC/Clang/MSVC way to load 40bit width integer:
std::uint64_t load_u40(const std::byte *p)
std::uint8_t lo = 0;
std::memcpy(&lo, p, 1);
std::uint32_t hi = 0;
std::memcpy(&hi, p + 1, 4);
return (static_cast<std::uint64_t>(hi) << 8)
https://godbolt.org/z/4Kk9IM
answered Mar 24 at 18:17
Nikita KniazevNikita Kniazev
2,49521023
2,49521023
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
add a comment |
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
How is this a GCC bug?
– aschepler
Mar 24 at 18:49
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
Thanks for your answer! Here is my ticket on this topic in GCC bug tracker gcc.gnu.org/bugzilla/show_bug.cgi?id=89804
– Eugene Kosov
Mar 24 at 18:51
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
@aschepler missing optimization opportunity/suboptimal code generation is considered a bug. gcc.gnu.org/bugzilla/buglist.cgi?keywords=missed-optimization
– Nikita Kniazev
Mar 24 at 21:07
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55314885%2fcompiler-optimization-move-variable-from-stack-to-register%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It's obvious without benchmarking that putting a variable on stack here is slower. Basically, I want this code to be optimized by gcc and I wanna know will this be correct or not.
– Eugene Kosov
Mar 23 at 14:55
1
I'm pretty sure there are no requirements in the standard about how they optimize, as long as they work right. But I don't have anything to cite to back me up. Looking.
– Kenny Ostrom
Mar 23 at 14:58
1
Unless this code is being run in a very tight loop hundreds of thousands of times per second, I doubt you will ever be able to measure any meaningful performance difference. So, why really care?
– Jesper Juhl
Mar 23 at 15:02
1
it's better to click the clone compiler button so that we can compare the versions in the same window
– phuclv
Mar 23 at 15:07
1
Why are you copying only 5 bytes here? I didn't thoroughly check this with the standard yet, but it would seem to me that this could very well be undefined behavior.
std::uint64_t
is a trivially-copyable type, but you're not copying the whole object representation here…– Michael Kenzel
Mar 23 at 15:26