Azure table writing with Process vs ThreadsHow do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP
Are cabin dividers used to "hide" the flex of the airplane?
Pristine Bit Checking
Patience, young "Padovan"
Can I find out the caloric content of bread by dehydrating it?
When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?
What does it exactly mean if a random variable follows a distribution
Why airport relocation isn't done gradually?
How can I add custom success page
Shall I use personal or official e-mail account when registering to external websites for work purpose?
Crop image to path created in TikZ?
Information to fellow intern about hiring?
Is there a way to make member function NOT callable from constructor?
Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore
New order #4: World
Can a planet have a different gravitational pull depending on its location in orbit around its sun?
A poker game description that does not feel gimmicky
Re-submission of rejected manuscript without informing co-authors
Email Account under attack (really) - anything I can do?
Does a dangling wire really electrocute me if I'm standing in water?
How to make payment on the internet without leaving a money trail?
Are white and non-white police officers equally likely to kill black suspects?
Is "plugging out" electronic devices an American expression?
Is this relativistic mass?
Is ipsum/ipsa/ipse a third person pronoun, or can it serve other functions?
Azure table writing with Process vs Threads
How do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am writing a huge amount of data to an Azure table from a console application in C#. This does the following
- Open a connection to an already existing table.
- Reads a file using StreamReader
- Collect 100 queries at a time and does a batch write to the table.
A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:
- Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]
6 cores/12 logical processors:
+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------
core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.
Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.
ThreadPool.SetMinThreads(20, 20);
Code:
foreach (var index in processIndex)
Task t = Task.Run(() =>
//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);
tasks.WaitAll();
Performance:
+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+
In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?
What is happening here and how can I solve this using threads/tasks?
thanks!
c# parallel-processing task
add a comment |
I am writing a huge amount of data to an Azure table from a console application in C#. This does the following
- Open a connection to an already existing table.
- Reads a file using StreamReader
- Collect 100 queries at a time and does a batch write to the table.
A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:
- Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]
6 cores/12 logical processors:
+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------
core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.
Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.
ThreadPool.SetMinThreads(20, 20);
Code:
foreach (var index in processIndex)
Task t = Task.Run(() =>
//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);
tasks.WaitAll();
Performance:
+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+
In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?
What is happening here and how can I solve this using threads/tasks?
thanks!
c# parallel-processing task
1
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40
add a comment |
I am writing a huge amount of data to an Azure table from a console application in C#. This does the following
- Open a connection to an already existing table.
- Reads a file using StreamReader
- Collect 100 queries at a time and does a batch write to the table.
A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:
- Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]
6 cores/12 logical processors:
+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------
core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.
Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.
ThreadPool.SetMinThreads(20, 20);
Code:
foreach (var index in processIndex)
Task t = Task.Run(() =>
//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);
tasks.WaitAll();
Performance:
+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+
In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?
What is happening here and how can I solve this using threads/tasks?
thanks!
c# parallel-processing task
I am writing a huge amount of data to an Azure table from a console application in C#. This does the following
- Open a connection to an already existing table.
- Reads a file using StreamReader
- Collect 100 queries at a time and does a batch write to the table.
A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:
- Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]
6 cores/12 logical processors:
+----------+---------------------------------+--------+------------------------------------
| #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) |
+----------+---------------------------------+--------+------------------------------------+
| 2 | 14.2s/10k | 1408/s | 256h |
| 4 | 14.5s/10k | 2758/s | 130h |
| 6 | 14.6s/10k | 4109/s | 87h |
| 8 | 15s/10k | 5333/s | 65h |
| 12 | 16.1s/10k | 7453/s | 48h |
| 16 | 17.9s/10K | 8888/s | 42h |
| 18 | 19s/10k | 9473/s | 38h |
| 20 | 21.37s/10k | 9358/s | 39h |
+----------+---------------------------------+--------+------------------------------------
core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.
Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.
ThreadPool.SetMinThreads(20, 20);
Code:
foreach (var index in processIndex)
Task t = Task.Run(() =>
//gets the appropriate file to read and write to table
string currentFile = string.Format(outFileFormat, index);
Service service = new Service(currentFile);
service.JustReadFile();
);
tasks.Add(t);
tasks.WaitAll();
Performance:
+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+
In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?
What is happening here and how can I solve this using threads/tasks?
thanks!
c# parallel-processing task
c# parallel-processing task
edited Mar 22 at 2:14
Erik Philips
41.7k694126
41.7k694126
asked Mar 22 at 1:48
arjun kumararjun kumar
1
1
1
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40
add a comment |
1
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40
1
1
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291748%2fazure-table-writing-with-process-vs-threads%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291748%2fazure-table-writing-with-process-vs-threads%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?
– Erik Philips
Mar 22 at 2:16
This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?
– arjun kumar
Mar 22 at 3:24
I can't promise anything, but take a look at this old cached version of some help
– Erik Philips
Mar 22 at 10:40