Finding Difference Between Strings in Two Dataframes/Lists, Output DifferenceWhat is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers

Idiomatic way to create an immutable and efficient class in C++?

Are spot colors limited and why CMYK mix is not treated same as spot color mix?

Is this homebrew "Faerie Fire Grenade" unbalanced?

Are sweatpants frowned upon on flights?

Why is the Ellipsoid Method of polynomial complexity?

Why are JWST optics not enclosed like HST?

Why did Starhopper's exhaust plume become brighter just before landing?

Pen test results for web application include a file from a forbidden directory that is not even used or referenced

is "prohibition against," a double negative?

Is this position a forced win for Black after move 14?

Get contents before a colon

I feel cheated by my new employer, does this sound right?

Why do IR remotes influence AM radios?

Why military weather satellites?

Is it possible for a person to be tricked into becoming a lich?

Can I lend a small amount of my own money to a bank at the federal funds rate?

Do universities maintain secret textbooks?

Why didn't Doc believe Marty was from the future?

What is the practical impact of using System.Random which is not cryptographically random?

Create a list of snaking numbers under 50,000

Which polygons can be turned inside out by a smooth deformation?

“I hope he visit us more often” Why is this wrong?

What is this "opened" cube called?

Group riding etiquette



Finding Difference Between Strings in Two Dataframes/Lists, Output Difference


What is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:



key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild



I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:



ABdifference
big
small
huge
normal


I've looked into using the difflib library however I don't think it'll produce the results in a readable format



I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...



import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']


i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.



If anyone could assist in anyway it would be hugely appreciated!



Cheers










share|improve this question


























  • You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

    – Erfan
    Mar 27 at 22:29











  • can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

    – Danny Moncadea
    Mar 28 at 13:05

















0















So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:



key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild



I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:



ABdifference
big
small
huge
normal


I've looked into using the difflib library however I don't think it'll produce the results in a readable format



I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...



import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']


i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.



If anyone could assist in anyway it would be hugely appreciated!



Cheers










share|improve this question


























  • You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

    – Erfan
    Mar 27 at 22:29











  • can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

    – Danny Moncadea
    Mar 28 at 13:05













0












0








0








So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:



key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild



I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:



ABdifference
big
small
huge
normal


I've looked into using the difflib library however I don't think it'll produce the results in a readable format



I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...



import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']


i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.



If anyone could assist in anyway it would be hugely appreciated!



Cheers










share|improve this question
















So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:



key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild



I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:



ABdifference
big
small
huge
normal


I've looked into using the difflib library however I don't think it'll produce the results in a readable format



I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...



import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']


i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.



If anyone could assist in anyway it would be hugely appreciated!



Cheers







python excel pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 13:05







Danny Moncadea

















asked Mar 27 at 22:11









Danny MoncadeaDanny Moncadea

204 bronze badges




204 bronze badges















  • You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

    – Erfan
    Mar 27 at 22:29











  • can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

    – Danny Moncadea
    Mar 28 at 13:05

















  • You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

    – Erfan
    Mar 27 at 22:29











  • can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

    – Danny Moncadea
    Mar 28 at 13:05
















You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29





You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29













can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05





can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05












2 Answers
2






active

oldest

votes


















0















You can use Dataframe.apply with a lambda function:



print(dfA, 'n')
print(dfB)

col1
0 cat
1 dog
2 mouse
3 child

col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild


Combine the dataframe with pd.concat:



df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild


Use .apply with replace



df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal





share|improve this answer

























  • hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

    – Danny Moncadea
    Mar 28 at 12:59












  • Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

    – Erfan
    Mar 28 at 13:13











  • No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

    – Danny Moncadea
    Mar 28 at 14:32


















0















You could try the below formula:



=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))





share|improve this answer

























  • unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

    – Danny Moncadea
    Mar 28 at 12:47













Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387244%2ffinding-difference-between-strings-in-two-dataframes-lists-output-difference%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0















You can use Dataframe.apply with a lambda function:



print(dfA, 'n')
print(dfB)

col1
0 cat
1 dog
2 mouse
3 child

col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild


Combine the dataframe with pd.concat:



df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild


Use .apply with replace



df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal





share|improve this answer

























  • hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

    – Danny Moncadea
    Mar 28 at 12:59












  • Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

    – Erfan
    Mar 28 at 13:13











  • No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

    – Danny Moncadea
    Mar 28 at 14:32















0















You can use Dataframe.apply with a lambda function:



print(dfA, 'n')
print(dfB)

col1
0 cat
1 dog
2 mouse
3 child

col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild


Combine the dataframe with pd.concat:



df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild


Use .apply with replace



df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal





share|improve this answer

























  • hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

    – Danny Moncadea
    Mar 28 at 12:59












  • Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

    – Erfan
    Mar 28 at 13:13











  • No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

    – Danny Moncadea
    Mar 28 at 14:32













0














0










0









You can use Dataframe.apply with a lambda function:



print(dfA, 'n')
print(dfB)

col1
0 cat
1 dog
2 mouse
3 child

col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild


Combine the dataframe with pd.concat:



df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild


Use .apply with replace



df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal





share|improve this answer













You can use Dataframe.apply with a lambda function:



print(dfA, 'n')
print(dfB)

col1
0 cat
1 dog
2 mouse
3 child

col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild


Combine the dataframe with pd.concat:



df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild


Use .apply with replace



df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 27 at 22:38









ErfanErfan

11.6k2 gold badges7 silver badges28 bronze badges




11.6k2 gold badges7 silver badges28 bronze badges















  • hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

    – Danny Moncadea
    Mar 28 at 12:59












  • Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

    – Erfan
    Mar 28 at 13:13











  • No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

    – Danny Moncadea
    Mar 28 at 14:32

















  • hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

    – Danny Moncadea
    Mar 28 at 12:59












  • Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

    – Erfan
    Mar 28 at 13:13











  • No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

    – Danny Moncadea
    Mar 28 at 14:32
















hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59






hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59














Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13





Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13













No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32





No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32













0















You could try the below formula:



=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))





share|improve this answer

























  • unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

    – Danny Moncadea
    Mar 28 at 12:47















0















You could try the below formula:



=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))





share|improve this answer

























  • unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

    – Danny Moncadea
    Mar 28 at 12:47













0














0










0









You could try the below formula:



=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))





share|improve this answer













You could try the below formula:



=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 28 at 9:31









Error 1004Error 1004

5,4522 gold badges9 silver badges25 bronze badges




5,4522 gold badges9 silver badges25 bronze badges















  • unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

    – Danny Moncadea
    Mar 28 at 12:47

















  • unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

    – Danny Moncadea
    Mar 28 at 12:47
















unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47





unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387244%2ffinding-difference-between-strings-in-two-dataframes-lists-output-difference%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript