Rewriting code to remove loop-carried dependenceHow do I tell Maven to use the latest version of a dependency?Unit Testing C CodeHow can I profile C++ code running on Linux?Improve INSERT-per-second performance of SQLite?Why are elementwise additions much faster in separate loops than in a combined loop?What is “:-!!” in C code?Can code that is valid in both C and C++ produce different behavior when compiled in each language?Obfuscated C Code Contest 2006. Please explain sykes2.cWhy does npm install say I have unmet dependencies?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations

How do you manage to study and have a balance in your life at the same time?

Should we run PBKDF2 for every plaintext to be protected or should we run PBKDF2 only once?

How can I portray a character with no fear of death, without them sounding utterly bored?

How can I store milk for long periods of time?

Underbrace in equation

Can my UK debt be collected because I have to return to US?

Colored grid with coordinates on all sides?

Why is this reversed StringBuilder not equal to the original String, when it is a palindrome?

Blender - Alpha is Luminance equivalent

Ideas behind the 8.Bd3 line in the 4.Ng5 Two Knights Defense

Missing $ inserted. Extra }, or forgotten $. Missing } inserted

How could reincarnation magic be limited to prevent overuse?

Was there an original & definitive use of alternate dimensions/realities in fiction?

How do I get my neighbour to stop disturbing with loud music?

Do universities maintain secret textbooks?

What is the motivation behind designing a control stick that does not move?

Can a system of three stars exist?

Is Chuck the Evil Sandwich Making Guy's head actually a sandwich?

Doesn't the concept of marginal utility speak to a cardinal utility function?

How were US credit cards verified in-store in the 1980's?

Received email from ISP saying one of my devices has malware

When you have to wait for a short time

Correct way of simplifying the result of an integral

How is the casino term "a high roller" commonly expressed in German?



Rewriting code to remove loop-carried dependence


How do I tell Maven to use the latest version of a dependency?Unit Testing C CodeHow can I profile C++ code running on Linux?Improve INSERT-per-second performance of SQLite?Why are elementwise additions much faster in separate loops than in a combined loop?What is “:-!!” in C code?Can code that is valid in both C and C++ produce different behavior when compiled in each language?Obfuscated C Code Contest 2006. Please explain sykes2.cWhy does npm install say I have unmet dependencies?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.



void loop_func(Arr* a, int blen, double* nval) 
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;

for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);

for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;





For convenience, I'm printing out some details that were printed via the printf above.



a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13


I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.



Question 1: can anyone describe a code transformation that would remove this dependence?



Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.










share|improve this question


























  • You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

    – Mansoor
    Mar 28 at 0:45












  • Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

    – Yakk - Adam Nevraumont
    Mar 28 at 0:52












  • quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

    – Kulluk007
    Mar 28 at 0:59











  • I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

    – Kulluk007
    Mar 28 at 1:09











  • So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

    – generic_opto_guy
    Mar 28 at 6:33

















1















I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.



void loop_func(Arr* a, int blen, double* nval) 
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;

for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);

for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;





For convenience, I'm printing out some details that were printed via the printf above.



a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13


I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.



Question 1: can anyone describe a code transformation that would remove this dependence?



Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.










share|improve this question


























  • You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

    – Mansoor
    Mar 28 at 0:45












  • Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

    – Yakk - Adam Nevraumont
    Mar 28 at 0:52












  • quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

    – Kulluk007
    Mar 28 at 0:59











  • I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

    – Kulluk007
    Mar 28 at 1:09











  • So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

    – generic_opto_guy
    Mar 28 at 6:33













1












1








1








I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.



void loop_func(Arr* a, int blen, double* nval) 
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;

for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);

for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;





For convenience, I'm printing out some details that were printed via the printf above.



a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13


I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.



Question 1: can anyone describe a code transformation that would remove this dependence?



Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.










share|improve this question
















I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.



void loop_func(Arr* a, int blen, double* nval) 
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;

for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++)
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lldttnval index: %lld-%lldn", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);

for (j = j_lower; j <= j_upper; j++)
nval[j] += avval;





For convenience, I'm printing out some details that were printed via the printf above.



a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13


I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.



Question 1: can anyone describe a code transformation that would remove this dependence?



Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.







c++ c parallel-processing dependencies






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 1:07







Kulluk007

















asked Mar 28 at 0:35









Kulluk007Kulluk007

3004 silver badges14 bronze badges




3004 silver badges14 bronze badges















  • You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

    – Mansoor
    Mar 28 at 0:45












  • Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

    – Yakk - Adam Nevraumont
    Mar 28 at 0:52












  • quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

    – Kulluk007
    Mar 28 at 0:59











  • I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

    – Kulluk007
    Mar 28 at 1:09











  • So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

    – generic_opto_guy
    Mar 28 at 6:33

















  • You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

    – Mansoor
    Mar 28 at 0:45












  • Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

    – Yakk - Adam Nevraumont
    Mar 28 at 0:52












  • quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

    – Kulluk007
    Mar 28 at 0:59











  • I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

    – Kulluk007
    Mar 28 at 1:09











  • So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

    – generic_opto_guy
    Mar 28 at 6:33
















You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

– Mansoor
Mar 28 at 0:45






You can parameterize a thread function based on the start index of the for-loop and create a thread-safe job queue which keeps a record of which indexes have already been handed out.

– Mansoor
Mar 28 at 0:45














Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

– Yakk - Adam Nevraumont
Mar 28 at 0:52






Why are you doing 0.5 * x * 2? Is nval in-out, or does it always start as 0? (j_upper - j_lower + 1); that is a fancy way to say 1. Either this code is nonsense or not a minimal reproducible example. Are those arrays 1 based?!

– Yakk - Adam Nevraumont
Mar 28 at 0:52














quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

– Kulluk007
Mar 28 at 0:59





quot in one case that I'm running is 2 so I put that here to make it concrete but it is actually different in the actual application. The arrays are zero-indexed (typical C arrays).

– Kulluk007
Mar 28 at 0:59













I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

– Kulluk007
Mar 28 at 1:09





I updated the example to be more complete. nval is output only -- it's zeroed out before the computation

– Kulluk007
Mar 28 at 1:09













So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

– generic_opto_guy
Mar 28 at 6:33





So the modified nvalfor val = i and val = i+1 overlap. Then you would need someway to handle this overlap. maybe atomics. Or since you do not modify the input(val): you could decide to handle nval 1-7 by thread 1 and nval 7-13 by thread 2, then you would not need any synchronisation.

– generic_opto_guy
Mar 28 at 6:33












0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388510%2frewriting-code-to-remove-loop-carried-dependence%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388510%2frewriting-code-to-remove-loop-carried-dependence%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript