OpenVINO unable to get optimum performance while running multiple inference enginesActual concurrency in Google App Engine backend/module instancescombining python watchdog with multiprocessing or threadingDesigning concurrency in a Python programWhy is my Python app stalled with 'system' / kernel CPU timePython multiprocessing hangs on heavy multithreading situationClarification of differences between parallel threads/processesMatplotlib (multi)threading with PyQt5Python IPC with matplotlibBuilding opencv with Intel Inference EngineUnable to run OpenVINO IE classification sample on Raspberry Pi

Male viewpoint in an erotic novel

Are language and thought the same?

Do I need to declare engagement ring bought in UK when flying on holiday to US?

These roommates throw strange parties

Magento 2: Set order history page as default after login

How could a planet have one hemisphere way warmer than the other without the planet being tidally locked?

Golfball Dimples on spaceships (and planes)?

SQL Always On COPY ONLY backups - what's the point if I cant restore the AG from these backups?

Notation: grace note played on the beat with a chord

Draw the ☣ (Biohazard Symbol)

Looking for the comic book where Spider-Man was [mistakenly] addressed as Super-Man

In-universe, why does Doc Brown program the time machine to go to 1955?

"syntax error near unexpected token" after editing .bashrc

extract specific cheracters from each line

Global variables and information security

What fraction of 2x2 USA call signs are vanity calls?

Book where main character comes out of stasis bubble

Is there some sort of French saying for "a person's signature move"?

Why did Tony's Arc Reactor do this?

Translate English to Pig Latin | PIG_LATIN.PY

French equivalent of "my cup of tea"

Friend is very nit picky about side comments I don't intend to be taken too seriously

Project Euler Problem 45

Matlab fmincon for a problem with many nonlinear constraints



OpenVINO unable to get optimum performance while running multiple inference engines


Actual concurrency in Google App Engine backend/module instancescombining python watchdog with multiprocessing or threadingDesigning concurrency in a Python programWhy is my Python app stalled with 'system' / kernel CPU timePython multiprocessing hangs on heavy multithreading situationClarification of differences between parallel threads/processesMatplotlib (multi)threading with PyQt5Python IPC with matplotlibBuilding opencv with Intel Inference EngineUnable to run OpenVINO IE classification sample on Raspberry Pi






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I am running multiple python processes( 4 in this case using multiprocessing module) for person detection (using ssd mobilenet model), each having it's own inference engine of OpenVINO. I am getting a very low FPS (not more than 10) for each process. My suspicion is the CPUs are not getting utilized optimally because the number of threads being spawned by each engine are high, which is adding to the overhead and also the sharing of CPUs across processes.

Also for single process, I am getting upto 60fps with OMP_NUM_THREADS set to 4.



My CPU details are:-
2 Sockets
4 cores each socket
1 thread each core
Total - 8 CPUs


So what would be the



  1. Optimal value for OMP_NUM_THREADS in this case?

  2. How can I avoid Sharing of CPUs across each process?

Currently I am playing with OMP_NUM_THREADS and KMP_AFFINITY variables, but just doing a hit and trail on setting the values. Any detail on how to set would be really helpful. Thanks










share|improve this question






























    0















    I am running multiple python processes( 4 in this case using multiprocessing module) for person detection (using ssd mobilenet model), each having it's own inference engine of OpenVINO. I am getting a very low FPS (not more than 10) for each process. My suspicion is the CPUs are not getting utilized optimally because the number of threads being spawned by each engine are high, which is adding to the overhead and also the sharing of CPUs across processes.

    Also for single process, I am getting upto 60fps with OMP_NUM_THREADS set to 4.



    My CPU details are:-
    2 Sockets
    4 cores each socket
    1 thread each core
    Total - 8 CPUs


    So what would be the



    1. Optimal value for OMP_NUM_THREADS in this case?

    2. How can I avoid Sharing of CPUs across each process?

    Currently I am playing with OMP_NUM_THREADS and KMP_AFFINITY variables, but just doing a hit and trail on setting the values. Any detail on how to set would be really helpful. Thanks










    share|improve this question


























      0












      0








      0








      I am running multiple python processes( 4 in this case using multiprocessing module) for person detection (using ssd mobilenet model), each having it's own inference engine of OpenVINO. I am getting a very low FPS (not more than 10) for each process. My suspicion is the CPUs are not getting utilized optimally because the number of threads being spawned by each engine are high, which is adding to the overhead and also the sharing of CPUs across processes.

      Also for single process, I am getting upto 60fps with OMP_NUM_THREADS set to 4.



      My CPU details are:-
      2 Sockets
      4 cores each socket
      1 thread each core
      Total - 8 CPUs


      So what would be the



      1. Optimal value for OMP_NUM_THREADS in this case?

      2. How can I avoid Sharing of CPUs across each process?

      Currently I am playing with OMP_NUM_THREADS and KMP_AFFINITY variables, but just doing a hit and trail on setting the values. Any detail on how to set would be really helpful. Thanks










      share|improve this question














      I am running multiple python processes( 4 in this case using multiprocessing module) for person detection (using ssd mobilenet model), each having it's own inference engine of OpenVINO. I am getting a very low FPS (not more than 10) for each process. My suspicion is the CPUs are not getting utilized optimally because the number of threads being spawned by each engine are high, which is adding to the overhead and also the sharing of CPUs across processes.

      Also for single process, I am getting upto 60fps with OMP_NUM_THREADS set to 4.



      My CPU details are:-
      2 Sockets
      4 cores each socket
      1 thread each core
      Total - 8 CPUs


      So what would be the



      1. Optimal value for OMP_NUM_THREADS in this case?

      2. How can I avoid Sharing of CPUs across each process?

      Currently I am playing with OMP_NUM_THREADS and KMP_AFFINITY variables, but just doing a hit and trail on setting the values. Any detail on how to set would be really helpful. Thanks







      python-multithreading openvino






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 28 at 4:38









      Rachit TayalRachit Tayal

      3813 silver badges11 bronze badges




      3813 silver badges11 bronze badges

























          2 Answers
          2






          active

          oldest

          votes


















          0
















          In case of multiple networks inference you may try to set OMP_WAIT_POLICY to PASSIVE.



          BTW, OpenVINO 2019R1 moved from OpenMP to TBB. It might give better efficiency in case of deep learning networks pipeline.






          share|improve this answer
































            0
















            In case if you are using the same model for all the processes consider to use OV multi-stream inference. Using this you can load single network and next to create a multiple infer requests. Using this you will have a better CPU utilization (if compare to running one infer request across multiple cores) and in result better throughput.



            To understand how to use multi stream inference take a look on inference_engine/samples/python_samples/benchmark_app/benchmark sample



            As well you can use benchmark sample to do a grid search to find an optimal configuration (number of streams, batch size).






            share|improve this answer



























              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );














              draft saved

              draft discarded
















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55390264%2fopenvino-unable-to-get-optimum-performance-while-running-multiple-inference-engi%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0
















              In case of multiple networks inference you may try to set OMP_WAIT_POLICY to PASSIVE.



              BTW, OpenVINO 2019R1 moved from OpenMP to TBB. It might give better efficiency in case of deep learning networks pipeline.






              share|improve this answer





























                0
















                In case of multiple networks inference you may try to set OMP_WAIT_POLICY to PASSIVE.



                BTW, OpenVINO 2019R1 moved from OpenMP to TBB. It might give better efficiency in case of deep learning networks pipeline.






                share|improve this answer



























                  0














                  0










                  0









                  In case of multiple networks inference you may try to set OMP_WAIT_POLICY to PASSIVE.



                  BTW, OpenVINO 2019R1 moved from OpenMP to TBB. It might give better efficiency in case of deep learning networks pipeline.






                  share|improve this answer













                  In case of multiple networks inference you may try to set OMP_WAIT_POLICY to PASSIVE.



                  BTW, OpenVINO 2019R1 moved from OpenMP to TBB. It might give better efficiency in case of deep learning networks pipeline.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 6 at 6:52









                  Dmitry KurtaevDmitry Kurtaev

                  4833 silver badges12 bronze badges




                  4833 silver badges12 bronze badges


























                      0
















                      In case if you are using the same model for all the processes consider to use OV multi-stream inference. Using this you can load single network and next to create a multiple infer requests. Using this you will have a better CPU utilization (if compare to running one infer request across multiple cores) and in result better throughput.



                      To understand how to use multi stream inference take a look on inference_engine/samples/python_samples/benchmark_app/benchmark sample



                      As well you can use benchmark sample to do a grid search to find an optimal configuration (number of streams, batch size).






                      share|improve this answer





























                        0
















                        In case if you are using the same model for all the processes consider to use OV multi-stream inference. Using this you can load single network and next to create a multiple infer requests. Using this you will have a better CPU utilization (if compare to running one infer request across multiple cores) and in result better throughput.



                        To understand how to use multi stream inference take a look on inference_engine/samples/python_samples/benchmark_app/benchmark sample



                        As well you can use benchmark sample to do a grid search to find an optimal configuration (number of streams, batch size).






                        share|improve this answer



























                          0














                          0










                          0









                          In case if you are using the same model for all the processes consider to use OV multi-stream inference. Using this you can load single network and next to create a multiple infer requests. Using this you will have a better CPU utilization (if compare to running one infer request across multiple cores) and in result better throughput.



                          To understand how to use multi stream inference take a look on inference_engine/samples/python_samples/benchmark_app/benchmark sample



                          As well you can use benchmark sample to do a grid search to find an optimal configuration (number of streams, batch size).






                          share|improve this answer













                          In case if you are using the same model for all the processes consider to use OV multi-stream inference. Using this you can load single network and next to create a multiple infer requests. Using this you will have a better CPU utilization (if compare to running one infer request across multiple cores) and in result better throughput.



                          To understand how to use multi stream inference take a look on inference_engine/samples/python_samples/benchmark_app/benchmark sample



                          As well you can use benchmark sample to do a grid search to find an optimal configuration (number of streams, batch size).







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Apr 17 at 21:02









                          DmitryDmitry

                          263 bronze badges




                          263 bronze badges































                              draft saved

                              draft discarded















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55390264%2fopenvino-unable-to-get-optimum-performance-while-running-multiple-inference-engi%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                              Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                              Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript