Binning data and calculating MAE for each bin in PythonCalling an external command in PythonWhat are metaclasses in Python?Is there a way to run Python on Android?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?

Why is there no Disney logo in MCU movies?

Distance between two points - by ID in QGIS

Is "prohibition against," a double negative?

Who declared the Last Alliance to be the "last" and why?

Inspiration for failed idea?

Was it illegal to blaspheme God in Antioch in 360.-410.?

Storing milk for long periods of time

How do I get my neighbour to stop disturbing with loud music?

Small RAM 4 KB on the early Apple II?

Create a list of snaking numbers under 50,000

Where should I draw the line on follow up questions from previous employer

How do I portray irrational anger in first person?

Can UV radiation be safe for the skin?

Why do presidential pardons exist in a country having a clear separation of powers?

In Endgame, wouldn't Stark have remembered Hulk busting out of the stairwell?

Terminology of atomic spectroscopy: Difference Among Term, States and Level

Should a TA point out a professor's mistake while attending their lecture?

Eshet Chayil in the Tunisian service

Get contents before a colon

Is the word 'mistake' a concrete or abstract noun?

Is Borg adaptation only temporary?

What's the origin of the concept of alternate dimensions/realities?

Under GDPR, can I give permission once to allow everyone to store and process my data?

Do universities maintain secret textbooks?



Binning data and calculating MAE for each bin in Python


Calling an external command in PythonWhat are metaclasses in Python?Is there a way to run Python on Android?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have two arrays:



Obs=([])
abs_error=([])


I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.



Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.



How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?



Right now I have:



abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])

idx = np.digitize(obs, bin_boundaries)
mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
print mn

[83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]


I can't print the whole arrays because they are very big.










share|improve this question
































    0















    I have two arrays:



    Obs=([])
    abs_error=([])


    I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.



    Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.



    How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?



    Right now I have:



    abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
    obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
    bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])

    idx = np.digitize(obs, bin_boundaries)
    mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
    print mn

    [83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]


    I can't print the whole arrays because they are very big.










    share|improve this question




























      0












      0








      0








      I have two arrays:



      Obs=([])
      abs_error=([])


      I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.



      Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.



      How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?



      Right now I have:



      abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
      obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
      bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])

      idx = np.digitize(obs, bin_boundaries)
      mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
      print mn

      [83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]


      I can't print the whole arrays because they are very big.










      share|improve this question
















      I have two arrays:



      Obs=([])
      abs_error=([])


      I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.



      Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.



      How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?



      Right now I have:



      abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
      obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
      bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])

      idx = np.digitize(obs, bin_boundaries)
      mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
      print mn

      [83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]


      I can't print the whole arrays because they are very big.







      python numpy scipy statistics binning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 1 at 7:10







      HM14

















      asked Mar 27 at 22:22









      HM14HM14

      3401 gold badge5 silver badges15 bronze badges




      3401 gold badge5 silver badges15 bronze badges

























          1 Answer
          1






          active

          oldest

          votes


















          1















          If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.



          idx = (Obs // 1).astype(int)


          If not use np.digitize instead.



          idx = np.digitize(Obs, bin_boundaries)


          Once you have indices use them with np.bincount to obtain the means.



          mn = np.bincount(idx, abs_error) / np.bincount(idx)





          share|improve this answer

























          • I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

            – HM14
            Apr 1 at 7:16











          • Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

            – HM14
            Apr 1 at 7:47











          • @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

            – Paul Panzer
            Apr 1 at 8:37










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387371%2fbinning-data-and-calculating-mae-for-each-bin-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1















          If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.



          idx = (Obs // 1).astype(int)


          If not use np.digitize instead.



          idx = np.digitize(Obs, bin_boundaries)


          Once you have indices use them with np.bincount to obtain the means.



          mn = np.bincount(idx, abs_error) / np.bincount(idx)





          share|improve this answer

























          • I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

            – HM14
            Apr 1 at 7:16











          • Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

            – HM14
            Apr 1 at 7:47











          • @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

            – Paul Panzer
            Apr 1 at 8:37















          1















          If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.



          idx = (Obs // 1).astype(int)


          If not use np.digitize instead.



          idx = np.digitize(Obs, bin_boundaries)


          Once you have indices use them with np.bincount to obtain the means.



          mn = np.bincount(idx, abs_error) / np.bincount(idx)





          share|improve this answer

























          • I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

            – HM14
            Apr 1 at 7:16











          • Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

            – HM14
            Apr 1 at 7:47











          • @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

            – Paul Panzer
            Apr 1 at 8:37













          1














          1










          1









          If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.



          idx = (Obs // 1).astype(int)


          If not use np.digitize instead.



          idx = np.digitize(Obs, bin_boundaries)


          Once you have indices use them with np.bincount to obtain the means.



          mn = np.bincount(idx, abs_error) / np.bincount(idx)





          share|improve this answer













          If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.



          idx = (Obs // 1).astype(int)


          If not use np.digitize instead.



          idx = np.digitize(Obs, bin_boundaries)


          Once you have indices use them with np.bincount to obtain the means.



          mn = np.bincount(idx, abs_error) / np.bincount(idx)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 27 at 23:22









          Paul PanzerPaul Panzer

          34.9k2 gold badges22 silver badges53 bronze badges




          34.9k2 gold badges22 silver badges53 bronze badges















          • I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

            – HM14
            Apr 1 at 7:16











          • Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

            – HM14
            Apr 1 at 7:47











          • @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

            – Paul Panzer
            Apr 1 at 8:37

















          • I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

            – HM14
            Apr 1 at 7:16











          • Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

            – HM14
            Apr 1 at 7:47











          • @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

            – Paul Panzer
            Apr 1 at 8:37
















          I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

          – HM14
          Apr 1 at 7:16





          I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?

          – HM14
          Apr 1 at 7:16













          Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

          – HM14
          Apr 1 at 7:47





          Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]

          – HM14
          Apr 1 at 7:47













          @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

          – Paul Panzer
          Apr 1 at 8:37





          @HM14 Yep, the mean of an empty bin is undefined, so NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.

          – Paul Panzer
          Apr 1 at 8:37








          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387371%2fbinning-data-and-calculating-mae-for-each-bin-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

          155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해