Azure table writing with Process vs ThreadsHow do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP

Are cabin dividers used to "hide" the flex of the airplane?

Pristine Bit Checking

Patience, young "Padovan"

Can I find out the caloric content of bread by dehydrating it?

When blogging recipes, how can I support both readers who want the narrative/journey and ones who want the printer-friendly recipe?

What does it exactly mean if a random variable follows a distribution

Why airport relocation isn't done gradually?

How can I add custom success page

Shall I use personal or official e-mail account when registering to external websites for work purpose?

Crop image to path created in TikZ?

Information to fellow intern about hiring?

Is there a way to make member function NOT callable from constructor?

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

New order #4: World

Can a planet have a different gravitational pull depending on its location in orbit around its sun?

A poker game description that does not feel gimmicky

Re-submission of rejected manuscript without informing co-authors

Email Account under attack (really) - anything I can do?

Does a dangling wire really electrocute me if I'm standing in water?

How to make payment on the internet without leaving a money trail?

Are white and non-white police officers equally likely to kill black suspects?

Is "plugging out" electronic devices an American expression?

Is this relativistic mass?

Is ipsum/ipsa/ipse a third person pronoun, or can it serve other functions?

Azure table writing with Process vs Threads

How do I update the GUI from another thread?concurrent file processinghow would the number of parallel processes affect the performance of CPU?Does anyone have benchmarks (code & results) comparing performance of Android apps written in Xamarin C# and Java?Sequential part of the program takes more time as process count increasesHow can I use parallel processing inside a WCF service code without using the IIS threads?Improve programs performance by splitting long running task into separate processesRun single-threaded in parallel in AzureIs continually writing to a file detrimental to the performance of a program?Disappointing results with OpenMP

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am writing a huge amount of data to an Azure table from a console application in C#. This does the following

Open a connection to an already existing table.

Reads a file using StreamReader

Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:

Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:

+----------+---------------------------------+--------+------------------------------------ | #process | Time per process per 10k writes | W/s | Total time in hours (1.3b queries) | +----------+---------------------------------+--------+------------------------------------+ | 2 | 14.2s/10k | 1408/s | 256h | | 4 | 14.5s/10k | 2758/s | 130h | | 6 | 14.6s/10k | 4109/s | 87h | | 8 | 15s/10k | 5333/s | 65h | | 12 | 16.1s/10k | 7453/s | 48h | | 16 | 17.9s/10K | 8888/s | 42h | | 18 | 19s/10k | 9473/s | 38h | | 20 | 21.37s/10k | 9358/s | 39h | +----------+---------------------------------+--------+------------------------------------

core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.

Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);

Code:

foreach (var index in processIndex)

 Task t = Task.Run(() => 
 
 //gets the appropriate file to read and write to table
 string currentFile = string.Format(outFileFormat, index);
 Service service = new Service(currentFile);
 service.JustReadFile();
 );
 tasks.Add(t);

tasks.WaitAll();

Performance:

+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+

In the above code all I am doing is reading the file for the respective task. Each task has its pre-assigned file to read. No writing to azure table is happening here and that itself is having a detrimental performance on increasing number of tasks. I thought perhaps the tasks are competing for resources or there is too much overhead in context switching. Since each task has its own Service object, I believe that may not be the case. I also do think reading files and creating objects is a I/O intensive task but if 20 processes can handle it, then so could 20 tasks?

What is happening here and how can I solve this using threads/tasks?

thanks!

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

1

This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16

This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24

I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40

add a comment |

I am writing a huge amount of data to an Azure table from a console application in C#. This does the following

Open a connection to an already existing table.

Reads a file using StreamReader

Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:

Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:

core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.

Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);

Code:

foreach (var index in processIndex)

 Task t = Task.Run(() => 
 
 //gets the appropriate file to read and write to table
 string currentFile = string.Format(outFileFormat, index);
 Service service = new Service(currentFile);
 service.JustReadFile();
 );
 tasks.Add(t);

tasks.WaitAll();

Performance:

+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+

What is happening here and how can I solve this using threads/tasks?

thanks!

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

1

This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16

This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24

I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40

add a comment |

I am writing a huge amount of data to an Azure table from a console application in C#. This does the following

Open a connection to an already existing table.

Reads a file using StreamReader

Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:

Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:

core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.

Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);

Code:

foreach (var index in processIndex)

 Task t = Task.Run(() => 
 
 //gets the appropriate file to read and write to table
 string currentFile = string.Format(outFileFormat, index);
 Service service = new Service(currentFile);
 service.JustReadFile();
 );
 tasks.Add(t);

tasks.WaitAll();

Performance:

+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+

What is happening here and how can I solve this using threads/tasks?

thanks!

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

I am writing a huge amount of data to an Azure table from a console application in C#. This does the following

Open a connection to an already existing table.

Reads a file using StreamReader

Collect 100 queries at a time and does a batch write to the table.

A single process has a write speed of about 500-700/s without parallelization and to write 1 billion entries would take about 30 days to complete. I did the following to optimize:

Created 20 processes executing the above exe and this ran in parallel without any issues, reducing the write time to 1.5 day. [Ideal scenario which I cannot do unfortunately due to restrictions in our code base]

6 cores/12 logical processors:

core/1 logical processor took almost the same time. As observed, the write time increases linearly with number of processes and independent of number of cores and logical processors weirdly. Azure tables have a max IOPS of about 20K per second.

Create a set of 20 tasks in the console application. This was not optimal and performance deteriorated as the number of cores reduced or number of threads increased. The best performance was observed for 2 Tasks. I tried changing the min limit in threadPool but that did not change anything.ThreadPool.SetMinThreads(20, 20);

Code:

foreach (var index in processIndex)

 Task t = Task.Run(() => 
 
 //gets the appropriate file to read and write to table
 string currentFile = string.Format(outFileFormat, index);
 Service service = new Service(currentFile);
 service.JustReadFile();
 );
 tasks.Add(t);

tasks.WaitAll();

Performance:

+--------+--------+------------------------------------+
| #tasks | W/s | Total time in hours (1.3b queries) |
+--------+--------+------------------------------------+
| 2 | 1408/s | 256h |
| 16 | ~800/s | ~488h |
| 18 | ~800/s | ~488h |
| 20 | ~800/s | ~488h |
+--------+--------+------------------------------------+

What is happening here and how can I solve this using threads/tasks?

thanks!

c# parallel-processing task

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

edited Mar 22 at 2:14

Erik Philips

41.7k694126

edited Mar 22 at 2:14

Erik Philips

41.7k694126

edited Mar 22 at 2:14

Erik Philips

41.7k694126

asked Mar 22 at 1:48

arjun kumar

asked Mar 22 at 1:48

arjun kumar

asked Mar 22 at 1:48

arjun kumar

1

This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16

This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24

I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40

add a comment |

1

This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16

This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24

I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40

This question is hard to answer because we don't know hardly anything. Why are you using Azure Tables instead of Sql or Cosmos? Is this a one time bulk insert or are you doing this every minute or hour or day? Who is access this data after it's written?

– Erik Philips
Mar 22 at 2:16

This is on a monthly schedule. The first time would be about ~ 1 billion rows and following that would be 100 million every month. This data will be accessed by multiple internal teams in their applications which are internet facing. Hence the need for fast reads using an Azure table. Anything I can add for more clarity?

– arjun kumar
Mar 22 at 3:24

I can't promise anything, but take a look at this old cached version of some help

– Erik Philips
Mar 22 at 10:40

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291748%2fazure-table-writing-with-process-vs-threads%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현