Rewriting code to remove loop-carried dependenceHow do I tell Maven to use the latest version of a dependency?Unit Testing C CodeHow can I profile C++ code running on Linux?Improve INSERT-per-second performance of SQLite?Why are elementwise additions much faster in separate loops than in a combined loop?What is “:-!!” in C code?Can code that is valid in both C and C++ produce different behavior when compiled in each language?Obfuscated C Code Contest 2006. Please explain sykes2.cWhy does npm install say I have unmet dependencies?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations
How do you manage to study and have a balance in your life at the same time?
Should we run PBKDF2 for every plaintext to be protected or should we run PBKDF2 only once?
How can I portray a character with no fear of death, without them sounding utterly bored?
How can I store milk for long periods of time?
Underbrace in equation
Can my UK debt be collected because I have to return to US?
Colored grid with coordinates on all sides?
Why is this reversed StringBuilder not equal to the original String, when it is a palindrome?
Blender - Alpha is Luminance equivalent
Ideas behind the 8.Bd3 line in the 4.Ng5 Two Knights Defense
Missing $ inserted. Extra }, or forgotten $. Missing } inserted
How could reincarnation magic be limited to prevent overuse?
Was there an original & definitive use of alternate dimensions/realities in fiction?
How do I get my neighbour to stop disturbing with loud music?
Do universities maintain secret textbooks?
What is the motivation behind designing a control stick that does not move?
Can a system of three stars exist?
Is Chuck the Evil Sandwich Making Guy's head actually a sandwich?
Doesn't the concept of marginal utility speak to a cardinal utility function?
How were US credit cards verified in-store in the 1980's?
Received email from ISP saying one of my devices has malware
When you have to wait for a short time
Correct way of simplifying the result of an integral
How is the casino term "a high roller" commonly expressed in German?
Rewriting code to remove loop-carried dependence
How do I tell Maven to use the latest version of a dependency?Unit Testing C CodeHow can I profile C++ code running on Linux?Improve INSERT-per-second performance of SQLite?Why are elementwise additions much faster in separate loops than in a combined loop?What is “:-!!” in C code?Can code that is valid in both C and C++ produce different behavior when compiled in each language?Obfuscated C Code Contest 2006. Please explain sykes2.cWhy does npm install say I have unmet dependencies?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm trying to rewrite this outer for
loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j]
is added to the previous value of nval[j]
along with this iteration's value of avval
.
void loop_func(Arr* a, int blen, double* nval)
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;
For convenience, I'm printing out some details that were printed via the printf
above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.
c++ c parallel-processing dependencies
add a comment |
I'm trying to rewrite this outer for
loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j]
is added to the previous value of nval[j]
along with this iteration's value of avval
.
void loop_func(Arr* a, int blen, double* nval)
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;
For convenience, I'm printing out some details that were printed via the printf
above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.
c++ c parallel-processing dependencies
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
Why are you doing0.5 * x * 2
? Isnval
in-out, or does it always start as0
?(j_upper - j_lower + 1);
that is a fancy way to say1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!
– Yakk - Adam Nevraumont
Mar 28 at 0:52
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).
– Kulluk007
Mar 28 at 0:59
I updated the example to be more complete.nval
is output only -- it's zeroed out before the computation
– Kulluk007
Mar 28 at 1:09
So the modifiednval
forval = i
andval = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handlenval
1-7 by thread 1 andnval
7-13 by thread 2, then you would not need any synchronisation.
– generic_opto_guy
Mar 28 at 6:33
add a comment |
I'm trying to rewrite this outer for
loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j]
is added to the previous value of nval[j]
along with this iteration's value of avval
.
void loop_func(Arr* a, int blen, double* nval)
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;
For convenience, I'm printing out some details that were printed via the printf
above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.
c++ c parallel-processing dependencies
I'm trying to rewrite this outer for
loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j]
is added to the previous value of nval[j]
along with this iteration's value of avval
.
void loop_func(Arr* a, int blen, double* nval)
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;
For convenience, I'm printing out some details that were printed via the printf
above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.
c++ c parallel-processing dependencies
c++ c parallel-processing dependencies
edited Mar 28 at 1:07
Kulluk007
asked Mar 28 at 0:35
Kulluk007Kulluk007
3004 silver badges14 bronze badges
3004 silver badges14 bronze badges
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
Why are you doing0.5 * x * 2
? Isnval
in-out, or does it always start as0
?(j_upper - j_lower + 1);
that is a fancy way to say1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!
– Yakk - Adam Nevraumont
Mar 28 at 0:52
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).
– Kulluk007
Mar 28 at 0:59
I updated the example to be more complete.nval
is output only -- it's zeroed out before the computation
– Kulluk007
Mar 28 at 1:09
So the modifiednval
forval = i
andval = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handlenval
1-7 by thread 1 andnval
7-13 by thread 2, then you would not need any synchronisation.
– generic_opto_guy
Mar 28 at 6:33
add a comment |
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
Why are you doing0.5 * x * 2
? Isnval
in-out, or does it always start as0
?(j_upper - j_lower + 1);
that is a fancy way to say1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!
– Yakk - Adam Nevraumont
Mar 28 at 0:52
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).
– Kulluk007
Mar 28 at 0:59
I updated the example to be more complete.nval
is output only -- it's zeroed out before the computation
– Kulluk007
Mar 28 at 1:09
So the modifiednval
forval = i
andval = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handlenval
1-7 by thread 1 andnval
7-13 by thread 2, then you would not need any synchronisation.
– generic_opto_guy
Mar 28 at 6:33
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
Why are you doing
0.5 * x * 2
? Is nval
in-out, or does it always start as 0
? (j_upper - j_lower + 1);
that is a fancy way to say 1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!– Yakk - Adam Nevraumont
Mar 28 at 0:52
Why are you doing
0.5 * x * 2
? Is nval
in-out, or does it always start as 0
? (j_upper - j_lower + 1);
that is a fancy way to say 1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!– Yakk - Adam Nevraumont
Mar 28 at 0:52
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).– Kulluk007
Mar 28 at 0:59
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).– Kulluk007
Mar 28 at 0:59
I updated the example to be more complete.
nval
is output only -- it's zeroed out before the computation– Kulluk007
Mar 28 at 1:09
I updated the example to be more complete.
nval
is output only -- it's zeroed out before the computation– Kulluk007
Mar 28 at 1:09
So the modified
nval
for val = i
and val = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handle nval
1-7 by thread 1 and nval
7-13 by thread 2, then you would not need any synchronisation.– generic_opto_guy
Mar 28 at 6:33
So the modified
nval
for val = i
and val = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handle nval
1-7 by thread 1 and nval
7-13 by thread 2, then you would not need any synchronisation.– generic_opto_guy
Mar 28 at 6:33
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388510%2frewriting-code-to-remove-loop-carried-dependence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388510%2frewriting-code-to-remove-loop-carried-dependence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.
– Mansoor
Mar 28 at 0:45
Why are you doing
0.5 * x * 2
? Isnval
in-out, or does it always start as0
?(j_upper - j_lower + 1);
that is a fancy way to say1
. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!– Yakk - Adam Nevraumont
Mar 28 at 0:52
quot
in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).– Kulluk007
Mar 28 at 0:59
I updated the example to be more complete.
nval
is output only -- it's zeroed out before the computation– Kulluk007
Mar 28 at 1:09
So the modified
nval
forval = i
andval = i+1
overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val
): you could decide to handlenval
1-7 by thread 1 andnval
7-13 by thread 2, then you would not need any synchronisation.– generic_opto_guy
Mar 28 at 6:33