Why when using .apply on pandas dataframe is it giving incorrect result? My loop version worksWhat is the most efficient way to loop through dataframes with pandas?Python Pandas How to assign groupby operation results back to columns in parent dataframe?Split (explode) pandas dataframe string entry to separate rowsHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow to apply a function to two columns of Pandas dataframeApply function to each row of pandas dataframe to create two new columnsWhy isn't my Pandas 'apply' function referencing multiple columns working?pandas apply function that returns multiple values to rows in pandas dataframePandas sort_index gives strange result after applying function to grouped DataFrameCan the apply function on a pandas dataframe produce a scalar?
Furthest distance half the diameter?
Is future tense in English really a myth?
Why does low tire pressure decrease fuel economy?
I need to know information from an old German birth certificate
How to finish my PhD?
Supervisor wants me to support a diploma-thesis SW tool after I graduated
indexes are not created on localdb
How to plot two curves with the same area under?
Leaving the USA for 10 yrs when you have asylum
Why does PAUSE key have a long make code and no break code?
Template default argument loses its reference type
Is it right to use the ideas of non-winning designers in a design contest?
The meaning of "offing" in "an agreement in the offing"
When Conservative MPs lose the whip, do they retain the right to vote in leadership elections?
Find the largest positive integer which can divide the sum of any five such numbers.
Infinitely many primes
How to improvise or make pot grip / pot handle
Electric shock from pedals and guitar. Jacks too long?
Do aarakocra have arms as well as wings?
Problem with listing a directory to grep
After a few interviews, What should I do after told to wait?
If every star in the universe except the Sun were destroyed, would we die?
How to convert P2O5 concentration to H3PO4 concentration?
Is a MySQL database a viable alternative to LDAP?
Why when using .apply on pandas dataframe is it giving incorrect result? My loop version works
What is the most efficient way to loop through dataframes with pandas?Python Pandas How to assign groupby operation results back to columns in parent dataframe?Split (explode) pandas dataframe string entry to separate rowsHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow to apply a function to two columns of Pandas dataframeApply function to each row of pandas dataframe to create two new columnsWhy isn't my Pandas 'apply' function referencing multiple columns working?pandas apply function that returns multiple values to rows in pandas dataframePandas sort_index gives strange result after applying function to grouped DataFrameCan the apply function on a pandas dataframe produce a scalar?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have two Pandas DataFrames:
df_topics_temp contains
a matrix with columnid
df_mapping contains
a mapping ofid
to aparentID
I'm trying to populate the column parent.id
in df_topics_temp
with the parentID
in df_mapping
.
I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply
to the df_topics_temp
doesn't work
Solution 1 (works):
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]
if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop
Solution 2 (does not work):
def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp
df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)
df_topics_temp.head()
The second solution (pandas .apply
) is not populating the parent.id
column in df_topics_temp
.
Thank you for the help
Update 1
<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')
python pandas dataframe
add a comment |
I have two Pandas DataFrames:
df_topics_temp contains
a matrix with columnid
df_mapping contains
a mapping ofid
to aparentID
I'm trying to populate the column parent.id
in df_topics_temp
with the parentID
in df_mapping
.
I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply
to the df_topics_temp
doesn't work
Solution 1 (works):
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]
if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop
Solution 2 (does not work):
def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp
df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)
df_topics_temp.head()
The second solution (pandas .apply
) is not populating the parent.id
column in df_topics_temp
.
Thank you for the help
Update 1
<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')
python pandas dataframe
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04
add a comment |
I have two Pandas DataFrames:
df_topics_temp contains
a matrix with columnid
df_mapping contains
a mapping ofid
to aparentID
I'm trying to populate the column parent.id
in df_topics_temp
with the parentID
in df_mapping
.
I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply
to the df_topics_temp
doesn't work
Solution 1 (works):
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]
if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop
Solution 2 (does not work):
def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp
df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)
df_topics_temp.head()
The second solution (pandas .apply
) is not populating the parent.id
column in df_topics_temp
.
Thank you for the help
Update 1
<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')
python pandas dataframe
I have two Pandas DataFrames:
df_topics_temp contains
a matrix with columnid
df_mapping contains
a mapping ofid
to aparentID
I'm trying to populate the column parent.id
in df_topics_temp
with the parentID
in df_mapping
.
I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply
to the df_topics_temp
doesn't work
Solution 1 (works):
def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False
for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]
if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop
Solution 2 (does not work):
def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp
df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)
df_topics_temp.head()
The second solution (pandas .apply
) is not populating the parent.id
column in df_topics_temp
.
Thank you for the help
Update 1
<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row
IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')
python pandas dataframe
python pandas dataframe
edited Apr 3 at 5:43
Jonathan Kruger
asked Mar 28 at 6:51
Jonathan KrugerJonathan Kruger
103 bronze badges
103 bronze badges
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04
add a comment |
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04
add a comment |
1 Answer
1
active
oldest
votes
If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
|
show 3 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391696%2fwhy-when-using-apply-on-pandas-dataframe-is-it-giving-incorrect-result-my-loop%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
|
show 3 more comments
If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
|
show 3 more comments
If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)
If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:
#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])
#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row
df1.apply(f,axis=1)
edited Apr 3 at 8:12
answered Mar 28 at 7:13
Itamar MushkinItamar Mushkin
1,1471 gold badge6 silver badges15 bronze badges
1,1471 gold badge6 silver badges15 bronze badges
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
|
show 3 more comments
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Thank you very much Itamar. That makes sense. I will try it out
– Jonathan Kruger
Mar 31 at 8:15
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help
– Jonathan Kruger
Apr 3 at 5:47
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.
– Itamar Mushkin
Apr 3 at 7:45
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.
– Itamar Mushkin
Apr 3 at 8:13
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
Thank you it works
– Jonathan Kruger
Apr 4 at 14:39
|
show 3 more comments
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391696%2fwhy-when-using-apply-on-pandas-dataframe-is-it-giving-incorrect-result-my-loop%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
First of all - I think you don't have to redefine isnan, the numpy version should work.
– Itamar Mushkin
Mar 28 at 7:04