Why when using .apply on pandas dataframe is it giving incorrect result? My loop version worksWhat is the most efficient way to loop through dataframes with pandas?Python Pandas How to assign groupby operation results back to columns in parent dataframe?Split (explode) pandas dataframe string entry to separate rowsHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow to apply a function to two columns of Pandas dataframeApply function to each row of pandas dataframe to create two new columnsWhy isn't my Pandas 'apply' function referencing multiple columns working?pandas apply function that returns multiple values to rows in pandas dataframePandas sort_index gives strange result after applying function to grouped DataFrameCan the apply function on a pandas dataframe produce a scalar?

Furthest distance half the diameter?

Is future tense in English really a myth?

Why does low tire pressure decrease fuel economy?

I need to know information from an old German birth certificate

How to finish my PhD?

Supervisor wants me to support a diploma-thesis SW tool after I graduated

indexes are not created on localdb

How to plot two curves with the same area under?

Leaving the USA for 10 yrs when you have asylum

Why does PAUSE key have a long make code and no break code?

Template default argument loses its reference type

Is it right to use the ideas of non-winning designers in a design contest?

The meaning of "offing" in "an agreement in the offing"

When Conservative MPs lose the whip, do they retain the right to vote in leadership elections?

Find the largest positive integer which can divide the sum of any five such numbers.

Infinitely many primes

How to improvise or make pot grip / pot handle

Electric shock from pedals and guitar. Jacks too long?

Do aarakocra have arms as well as wings?

Problem with listing a directory to grep

After a few interviews, What should I do after told to wait?

If every star in the universe except the Sun were destroyed, would we die?

How to convert P2O5 concentration to H3PO4 concentration?

Is a MySQL database a viable alternative to LDAP?



Why when using .apply on pandas dataframe is it giving incorrect result? My loop version works


What is the most efficient way to loop through dataframes with pandas?Python Pandas How to assign groupby operation results back to columns in parent dataframe?Split (explode) pandas dataframe string entry to separate rowsHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow to apply a function to two columns of Pandas dataframeApply function to each row of pandas dataframe to create two new columnsWhy isn't my Pandas 'apply' function referencing multiple columns working?pandas apply function that returns multiple values to rows in pandas dataframePandas sort_index gives strange result after applying function to grouped DataFrameCan the apply function on a pandas dataframe produce a scalar?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I have two Pandas DataFrames:




  1. df_topics_temp contains a matrix with column id


  2. df_mapping contains a mapping of id to a parentID

I'm trying to populate the column parent.id in df_topics_temp with the parentID in df_mapping.



I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply to the df_topics_temp doesn't work



Solution 1 (works):




def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False

for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]

if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop



Solution 2 (does not work):




def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head()



The second solution (pandas .apply) is not populating the parent.id column in df_topics_temp.



Thank you for the help



Update 1



<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')









share|improve this question


























  • First of all - I think you don't have to redefine isnan, the numpy version should work.

    – Itamar Mushkin
    Mar 28 at 7:04

















1















I have two Pandas DataFrames:




  1. df_topics_temp contains a matrix with column id


  2. df_mapping contains a mapping of id to a parentID

I'm trying to populate the column parent.id in df_topics_temp with the parentID in df_mapping.



I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply to the df_topics_temp doesn't work



Solution 1 (works):




def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False

for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]

if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop



Solution 2 (does not work):




def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head()



The second solution (pandas .apply) is not populating the parent.id column in df_topics_temp.



Thank you for the help



Update 1



<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')









share|improve this question


























  • First of all - I think you don't have to redefine isnan, the numpy version should work.

    – Itamar Mushkin
    Mar 28 at 7:04













1












1








1








I have two Pandas DataFrames:




  1. df_topics_temp contains a matrix with column id


  2. df_mapping contains a mapping of id to a parentID

I'm trying to populate the column parent.id in df_topics_temp with the parentID in df_mapping.



I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply to the df_topics_temp doesn't work



Solution 1 (works):




def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False

for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]

if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop



Solution 2 (does not work):




def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head()



The second solution (pandas .apply) is not populating the parent.id column in df_topics_temp.



Thank you for the help



Update 1



<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')









share|improve this question
















I have two Pandas DataFrames:




  1. df_topics_temp contains a matrix with column id


  2. df_mapping contains a mapping of id to a parentID

I'm trying to populate the column parent.id in df_topics_temp with the parentID in df_mapping.



I have written a solution using loops although it is very cumbersome. It works. My solution using pandas .apply to the df_topics_temp doesn't work



Solution 1 (works):




def isnan(value):
try:
import math
return math.isnan(float(value))
except:
return False

for x in range(0, df_topics_temp['id'].count()):
topic_id_loop = df_topics_temp['topic.id'].iloc[x]
mapping_row = df_mapping[df_mapping['id'] == topic_id_loop]
parent_id = mapping_row['parentId'].iloc[0]

if isnan(parent_id):
df_topics_temp['parent.id'].iloc[x] = mapping_row['id'].iloc[0]
else:
df_topics_temp['parent.id'].iloc[x] = topic_id_loop



Solution 2 (does not work):




def map_function(x):
df_topics_temp = df_mapping.loc[df_mapping['id'] == x]
temp = df_topics_temp['parentId'].iloc[0]
return temp

df_topics_temp['parent.id'] = df_topics_temp['topic.id'].apply(map_function)

df_topics_temp.head()



The second solution (pandas .apply) is not populating the parent.id column in df_topics_temp.



Thank you for the help



Update 1



<ipython-input-68-a2e8d9a21c26> in map_function(row)
1 def map_function(row):
----> 2 row['parent.id'] = df_mapping.loc[df_mapping['id']==row['topic.id']]['parentId'].values[0]
3 return row

IndexError: ('index 0 is out of bounds for axis 0 with size 0', 'occurred at index 190999')






python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 3 at 5:43







Jonathan Kruger

















asked Mar 28 at 6:51









Jonathan KrugerJonathan Kruger

103 bronze badges




103 bronze badges















  • First of all - I think you don't have to redefine isnan, the numpy version should work.

    – Itamar Mushkin
    Mar 28 at 7:04

















  • First of all - I think you don't have to redefine isnan, the numpy version should work.

    – Itamar Mushkin
    Mar 28 at 7:04
















First of all - I think you don't have to redefine isnan, the numpy version should work.

– Itamar Mushkin
Mar 28 at 7:04





First of all - I think you don't have to redefine isnan, the numpy version should work.

– Itamar Mushkin
Mar 28 at 7:04












1 Answer
1






active

oldest

votes


















0
















If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:



#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])

#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row

df1.apply(f,axis=1)





share|improve this answer



























  • Thank you very much Itamar. That makes sense. I will try it out

    – Jonathan Kruger
    Mar 31 at 8:15











  • Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

    – Jonathan Kruger
    Apr 3 at 5:47












  • First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

    – Itamar Mushkin
    Apr 3 at 7:45











  • Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

    – Itamar Mushkin
    Apr 3 at 8:13











  • Thank you it works

    – Jonathan Kruger
    Apr 4 at 14:39










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391696%2fwhy-when-using-apply-on-pandas-dataframe-is-it-giving-incorrect-result-my-loop%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0
















If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:



#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])

#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row

df1.apply(f,axis=1)





share|improve this answer



























  • Thank you very much Itamar. That makes sense. I will try it out

    – Jonathan Kruger
    Mar 31 at 8:15











  • Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

    – Jonathan Kruger
    Apr 3 at 5:47












  • First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

    – Itamar Mushkin
    Apr 3 at 7:45











  • Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

    – Itamar Mushkin
    Apr 3 at 8:13











  • Thank you it works

    – Jonathan Kruger
    Apr 4 at 14:39















0
















If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:



#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])

#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row

df1.apply(f,axis=1)





share|improve this answer



























  • Thank you very much Itamar. That makes sense. I will try it out

    – Jonathan Kruger
    Mar 31 at 8:15











  • Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

    – Jonathan Kruger
    Apr 3 at 5:47












  • First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

    – Itamar Mushkin
    Apr 3 at 7:45











  • Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

    – Itamar Mushkin
    Apr 3 at 8:13











  • Thank you it works

    – Jonathan Kruger
    Apr 4 at 14:39













0














0










0









If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:



#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])

#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row

df1.apply(f,axis=1)





share|improve this answer















If I understand correctly, 'apply' takes a row and returns a row.
So, you want your function to return a row. Yours returns a value.
For example:



#setting up the dataframes
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_dict('name':['alice','bob'],'id':[1,2])
mapping = pd.DataFrame.from_dict('id':[1,2,3,4],'parent_id':[100,200,100,200])

#mapping function
def f(row):
if any(mapping['id']==row['id']):
row['parent_id'] = mapping.loc[mapping['id']==row['id']]['parent_id'].values[0]
else: # missing value
row['parent_id'] = np.nan
return row

df1.apply(f,axis=1)






share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 3 at 8:12

























answered Mar 28 at 7:13









Itamar MushkinItamar Mushkin

1,1471 gold badge6 silver badges15 bronze badges




1,1471 gold badge6 silver badges15 bronze badges















  • Thank you very much Itamar. That makes sense. I will try it out

    – Jonathan Kruger
    Mar 31 at 8:15











  • Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

    – Jonathan Kruger
    Apr 3 at 5:47












  • First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

    – Itamar Mushkin
    Apr 3 at 7:45











  • Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

    – Itamar Mushkin
    Apr 3 at 8:13











  • Thank you it works

    – Jonathan Kruger
    Apr 4 at 14:39

















  • Thank you very much Itamar. That makes sense. I will try it out

    – Jonathan Kruger
    Mar 31 at 8:15











  • Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

    – Jonathan Kruger
    Apr 3 at 5:47












  • First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

    – Itamar Mushkin
    Apr 3 at 7:45











  • Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

    – Itamar Mushkin
    Apr 3 at 8:13











  • Thank you it works

    – Jonathan Kruger
    Apr 4 at 14:39
















Thank you very much Itamar. That makes sense. I will try it out

– Jonathan Kruger
Mar 31 at 8:15





Thank you very much Itamar. That makes sense. I will try it out

– Jonathan Kruger
Mar 31 at 8:15













Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

– Jonathan Kruger
Apr 3 at 5:47






Please see Update 1 in my original post above. It is the error that I'm getting when I apply the code you suggested to my dataframe. Please help

– Jonathan Kruger
Apr 3 at 5:47














First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

– Itamar Mushkin
Apr 3 at 7:45





First of all, please check that the offending row (190999) has a legitimate 'parent' by ID, and it's not a problem in the data.

– Itamar Mushkin
Apr 3 at 7:45













Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

– Itamar Mushkin
Apr 3 at 8:13





Anyway, I've added a condition to handle missing values. It should handle your missing values and not result in an exception.

– Itamar Mushkin
Apr 3 at 8:13













Thank you it works

– Jonathan Kruger
Apr 4 at 14:39





Thank you it works

– Jonathan Kruger
Apr 4 at 14:39






Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.




















draft saved

draft discarded















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391696%2fwhy-when-using-apply-on-pandas-dataframe-is-it-giving-incorrect-result-my-loop%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript