Azure table writing with Process vs ThreadsHow do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP

Are cabin dividers used to "hide" the flex of the airplane?

Pristine Bit Checking

Patience, young "Padovan"

Can I find out the caloric content of bread by dehydrating it?

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?

What does it exactly mean if a random variable follows a distribution

Why airport relocation isn't done gradually?

How can I add custom success page

Shall I use personal or official e-mail account when registering to external websites for work purpose?

Crop image to path created in TikZ?

Information to fellow intern about hiring?

Is there a way to make member function NOT callable from constructor?

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

New order #4: World

Can a planet have a different gravitational pull depending on its location in orbit around its sun?

A poker game description that does not feel gimmicky

Re-submission of rejected manuscript without informing co-authors

Email Account under attack (really) - anything I can do?

Does a dangling wire really electrocute me if I'm standing in water?

How to make payment on the internet without leaving a money trail?

Are white and non-white police officers equally likely to kill black suspects?

Is "plugging out" electronic devices an American expression?

Is this relativistic mass?

Is ipsum/ipsa/ipse a third person pronoun, or can it serve other functions?



Azure table writing with Process vs Threads


How do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am writing a huge amount of data to an Azure table from a console application in C#. This does the following



  1. Open a connection to an already existing table.

  2. Reads a file using StreamReader

  3. Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:



  1. Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:



+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------



  1. core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.


  2. Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);


Code:



foreach (var index in processIndex)

Task t = Task.Run(() =>

//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);

tasks.WaitAll();


Performance:



+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+


In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?



What is happening here and how can I solve this using threads/tasks?



thanks!










share|improve this question



















  • 1





    This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

    – Erik Philips
    Mar 22 at 2:16











  • This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

    – arjun kumar
    Mar 22 at 3:24











  • I can't promise anything, but take a look at this old cached version of some help

    – Erik Philips
    Mar 22 at 10:40

















0















I am writing a huge amount of data to an Azure table from a console application in C#. This does the following



  1. Open a connection to an already existing table.

  2. Reads a file using StreamReader

  3. Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:



  1. Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:



+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------



  1. core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.


  2. Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);


Code:



foreach (var index in processIndex)

Task t = Task.Run(() =>

//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);

tasks.WaitAll();


Performance:



+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+


In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?



What is happening here and how can I solve this using threads/tasks?



thanks!










share|improve this question



















  • 1





    This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

    – Erik Philips
    Mar 22 at 2:16











  • This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

    – arjun kumar
    Mar 22 at 3:24











  • I can't promise anything, but take a look at this old cached version of some help

    – Erik Philips
    Mar 22 at 10:40













0












0








0








I am writing a huge amount of data to an Azure table from a console application in C#. This does the following



  1. Open a connection to an already existing table.

  2. Reads a file using StreamReader

  3. Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:



  1. Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:



+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------



  1. core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.


  2. Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);


Code:



foreach (var index in processIndex)

Task t = Task.Run(() =>

//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);

tasks.WaitAll();


Performance:



+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+


In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?



What is happening here and how can I solve this using threads/tasks?



thanks!










share|improve this question
















I am writing a huge amount of data to an Azure table from a console application in C#. This does the following



  1. Open a connection to an already existing table.

  2. Reads a file using StreamReader

  3. Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:



  1. Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:



+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------



  1. core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.


  2. Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);


Code:



foreach (var index in processIndex)

Task t = Task.Run(() =>

//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);

tasks.WaitAll();


Performance:



+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+


In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?



What is happening here and how can I solve this using threads/tasks?



thanks!







c# parallel-processing task






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 2:14









Erik Philips

41.7k694126




41.7k694126










asked Mar 22 at 1:48









arjun kumararjun kumar

1




1







  • 1





    This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

    – Erik Philips
    Mar 22 at 2:16











  • This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

    – arjun kumar
    Mar 22 at 3:24











  • I can't promise anything, but take a look at this old cached version of some help

    – Erik Philips
    Mar 22 at 10:40












  • 1





    This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

    – Erik Philips
    Mar 22 at 2:16











  • This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

    – arjun kumar
    Mar 22 at 3:24











  • I can't promise anything, but take a look at this old cached version of some help

    – Erik Philips
    Mar 22 at 10:40







1




1





This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16





This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16













This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24





This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24













I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40





I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291748%2fazure-table-writing-with-process-vs-threads%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291748%2fazure-table-writing-with-process-vs-threads%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript