Binning data and calculating MAE for each bin in PythonCalling an external command in PythonWhat are metaclasses in Python?Is there a way to run Python on Android?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?
Why is there no Disney logo in MCU movies?
Distance between two points - by ID in QGIS
Is "prohibition against," a double negative?
Who declared the Last Alliance to be the "last" and why?
Inspiration for failed idea?
Was it illegal to blaspheme God in Antioch in 360.-410.?
Storing milk for long periods of time
How do I get my neighbour to stop disturbing with loud music?
Small RAM 4 KB on the early Apple II?
Create a list of snaking numbers under 50,000
Where should I draw the line on follow up questions from previous employer
How do I portray irrational anger in first person?
Can UV radiation be safe for the skin?
Why do presidential pardons exist in a country having a clear separation of powers?
In Endgame, wouldn't Stark have remembered Hulk busting out of the stairwell?
Terminology of atomic spectroscopy: Difference Among Term, States and Level
Should a TA point out a professor's mistake while attending their lecture?
Eshet Chayil in the Tunisian service
Get contents before a colon
Is the word 'mistake' a concrete or abstract noun?
Is Borg adaptation only temporary?
What's the origin of the concept of alternate dimensions/realities?
Under GDPR, can I give permission once to allow everyone to store and process my data?
Do universities maintain secret textbooks?
Binning data and calculating MAE for each bin in Python
Calling an external command in PythonWhat are metaclasses in Python?Is there a way to run Python on Android?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How to get the current time in PythonHow can I make a time delay in Python?Does Python have a string 'contains' substring method?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have two arrays:
Obs=([])
abs_error=([])
I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.
Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.
How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?
Right now I have:
abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])
idx = np.digitize(obs, bin_boundaries)
mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
print mn
[83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]
I can't print the whole arrays because they are very big.
python numpy scipy statistics binning
add a comment |
I have two arrays:
Obs=([])
abs_error=([])
I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.
Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.
How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?
Right now I have:
abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])
idx = np.digitize(obs, bin_boundaries)
mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
print mn
[83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]
I can't print the whole arrays because they are very big.
python numpy scipy statistics binning
add a comment |
I have two arrays:
Obs=([])
abs_error=([])
I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.
Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.
How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?
Right now I have:
abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])
idx = np.digitize(obs, bin_boundaries)
mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
print mn
[83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]
I can't print the whole arrays because they are very big.
python numpy scipy statistics binning
I have two arrays:
Obs=([])
abs_error=([])
I want to use Obs to define the bins. For example, Where Obs is 1 to 2, bin abs_error into bin#1. Then where Obs is 2 to 3, bin abs_error into bin#2. etc.
Once I have my binned abs_error (which was binned by Obs) I want to calculate the mean of each bin and then plot the mean of each bin on the y-axis vs the bins on the x-axis.
How do I go about binning the abs_error by bins defined by the Obs? And how do I calculate the mean of each bin once this is done?
Right now I have:
abs_error=np.array([2.214033842086792 2.65031099319458 2.021354913711548 ... 2.831442356109619 1.9227538108825684 0.19358205795288086])
obs=np.array([3.3399999141693115 1.440000057220459 1.2799999713897705 ... 5.78000020980835 6.050000190734863 7.75])
bin_boundaries=np.array([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0])
idx = np.digitize(obs, bin_boundaries)
mn_ = np.bincount(idx,abs_error) / np.bincount(idx)
print mn
[83.09254473 3.18577858 2.82887524 2.78532805 2.43264693 1.96835116 1.77645996 1.66138196 1.5972414 1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195]
I can't print the whole arrays because they are very big.
python numpy scipy statistics binning
python numpy scipy statistics binning
edited Apr 1 at 7:10
HM14
asked Mar 27 at 22:22
HM14HM14
3401 gold badge5 silver badges15 bronze badges
3401 gold badge5 silver badges15 bronze badges
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.
idx = (Obs // 1).astype(int)
If not use np.digitize instead.
idx = np.digitize(Obs, bin_boundaries)
Once you have indices use them with np.bincount to obtain the means.
mn = np.bincount(idx, abs_error) / np.bincount(idx)
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, soNaNis the appropriate answer. Technically, what happens is a0 / 0where the first zero is the sum and the second zero is the element count.
– Paul Panzer
Apr 1 at 8:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387371%2fbinning-data-and-calculating-mae-for-each-bin-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.
idx = (Obs // 1).astype(int)
If not use np.digitize instead.
idx = np.digitize(Obs, bin_boundaries)
Once you have indices use them with np.bincount to obtain the means.
mn = np.bincount(idx, abs_error) / np.bincount(idx)
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, soNaNis the appropriate answer. Technically, what happens is a0 / 0where the first zero is the sum and the second zero is the element count.
– Paul Panzer
Apr 1 at 8:37
add a comment |
If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.
idx = (Obs // 1).astype(int)
If not use np.digitize instead.
idx = np.digitize(Obs, bin_boundaries)
Once you have indices use them with np.bincount to obtain the means.
mn = np.bincount(idx, abs_error) / np.bincount(idx)
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, soNaNis the appropriate answer. Technically, what happens is a0 / 0where the first zero is the sum and the second zero is the element count.
– Paul Panzer
Apr 1 at 8:37
add a comment |
If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.
idx = (Obs // 1).astype(int)
If not use np.digitize instead.
idx = np.digitize(Obs, bin_boundaries)
Once you have indices use them with np.bincount to obtain the means.
mn = np.bincount(idx, abs_error) / np.bincount(idx)
If your bins are all the same size you can use floor division to obtain bin indices from Obs, in your example.
idx = (Obs // 1).astype(int)
If not use np.digitize instead.
idx = np.digitize(Obs, bin_boundaries)
Once you have indices use them with np.bincount to obtain the means.
mn = np.bincount(idx, abs_error) / np.bincount(idx)
answered Mar 27 at 23:22
Paul PanzerPaul Panzer
34.9k2 gold badges22 silver badges53 bronze badges
34.9k2 gold badges22 silver badges53 bronze badges
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, soNaNis the appropriate answer. Technically, what happens is a0 / 0where the first zero is the sum and the second zero is the element count.
– Paul Panzer
Apr 1 at 8:37
add a comment |
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, soNaNis the appropriate answer. Technically, what happens is a0 / 0where the first zero is the sum and the second zero is the element count.
– Paul Panzer
Apr 1 at 8:37
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
I think I must be doing something wrong. I am getting a weird number for the mean of the first bin. I have edited my question above to include the code with your updated answer. I am getting a mean of 83 for the first bin which can't be because the max value for obs is 17 and the max value for abs_error is 13. so I don't see how the means in any of the bins could exceed this. Can you tell what I am doing wrong?
– HM14
Apr 1 at 7:16
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
Ignore my previous comment. Turns out I had some nans in the array screwing up the results. I masked the arrays for where there are nans and this seemed to fix my issue. However, if I get nans on the upper part of my array does that mean there is no count in those bins? for example I get mn=[ nan 3.18577858 2.82887524 ......1.57512014 1.53094066 1.7965252 1.98050336 2.29916244 3.06640482 4.66769505 3.16787195 nan nan nan nan nan] and for bincount(idx)=[ 0 157 458 677 855 920 979 ....... 4 2 2 0 0 0 0 0 3133]
– HM14
Apr 1 at 7:47
@HM14 Yep, the mean of an empty bin is undefined, so
NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.– Paul Panzer
Apr 1 at 8:37
@HM14 Yep, the mean of an empty bin is undefined, so
NaN is the appropriate answer. Technically, what happens is a 0 / 0 where the first zero is the sum and the second zero is the element count.– Paul Panzer
Apr 1 at 8:37
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387371%2fbinning-data-and-calculating-mae-for-each-bin-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown