Finding Difference Between Strings in Two Dataframes/Lists, Output DifferenceWhat is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers

Finding Difference Between Strings in Two Dataframes/Lists, Output DifferenceWhat is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between str and repr?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers

Idiomatic way to create an immutable and efficient class in C++?

Are spot colors limited and why CMYK mix is not treated same as spot color mix?

Is this homebrew "Faerie Fire Grenade" unbalanced?

Are sweatpants frowned upon on flights?

Why is the Ellipsoid Method of polynomial complexity?

Why are JWST optics not enclosed like HST?

Why did Starhopper's exhaust plume become brighter just before landing?

Pen test results for web application include a file from a forbidden directory that is not even used or referenced

is "prohibition against," a double negative?

Is this position a forced win for Black after move 14?

Get contents before a colon

I feel cheated by my new employer, does this sound right?

Why do IR remotes influence AM radios?

Why military weather satellites?

Is it possible for a person to be tricked into becoming a lich?

Can I lend a small amount of my own money to a bank at the federal funds rate?

Do universities maintain secret textbooks?

Why didn't Doc believe Marty was from the future?

What is the practical impact of using System.Random which is not cryptographically random?

Create a list of snaking numbers under 50,000

Which polygons can be turned inside out by a smooth deformation?

“I hope he visit us more often” Why is this wrong?

What is this "opened" cube called?

Group riding etiquette

Finding Difference Between Strings in Two Dataframes/Lists, Output Difference

What is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:

key dfA dfB 1 cat bigcat 2 dog smalldog 3 mouse hugemouse 4 child normalchild

I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:

ABdifference
big
small
huge
normal

I've looked into using the difflib library however I don't think it'll produce the results in a readable format

I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...

import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']

i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.

If anyone could assist in anyway it would be hugely appreciated!

Cheers

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29

can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05

add a comment |

key dfA dfB 1 cat bigcat 2 dog smalldog 3 mouse hugemouse 4 child normalchild

I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:

ABdifference
big
small
huge
normal

I've looked into using the difflib library however I don't think it'll produce the results in a readable format

I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...

import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']

i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.

If anyone could assist in anyway it would be hugely appreciated!

Cheers

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29

can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05

add a comment |

key dfA dfB 1 cat bigcat 2 dog smalldog 3 mouse hugemouse 4 child normalchild

I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:

ABdifference
big
small
huge
normal

I've looked into using the difflib library however I don't think it'll produce the results in a readable format

I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...

import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']

i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.

If anyone could assist in anyway it would be hugely appreciated!

Cheers

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

key dfA dfB 1 cat bigcat 2 dog smalldog 3 mouse hugemouse 4 child normalchild

I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:

ABdifference
big
small
huge
normal

I've looked into using the difflib library however I don't think it'll produce the results in a readable format

I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...

import pandas as pd
from pandas import ExcelWriter
import difflib

df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')

first= df['A']
second = df['B']

i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.

If anyone could assist in anyway it would be hugely appreciated!

Cheers

python excel pandas

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

edited Mar 28 at 13:05

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

asked Mar 27 at 22:11

Danny Moncadea

204 bronze badges

You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29

can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05

add a comment |

You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29

can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05

You should post the rest of the columns as well, we can check if there are keycolumns which we can use to simply combine the data.

– Erfan
Mar 27 at 22:29

can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post

– Danny Moncadea
Mar 28 at 13:05

add a comment |

2 Answers
2

active

oldest

votes

You can use Dataframe.apply with a lambda function:

print(dfA, 'n')
print(dfB)

 col1
0 cat
1 dog
2 mouse
3 child 

 col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild

Combine the dataframe with pd.concat:

df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
 col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild

Use .apply with replace

df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
 col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

add a comment |

You could try the below formula:

=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387244%2ffinding-difference-between-strings-in-two-dataframes-lists-output-difference%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use Dataframe.apply with a lambda function:

print(dfA, 'n')
print(dfB)

 col1
0 cat
1 dog
2 mouse
3 child 

 col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild

Combine the dataframe with pd.concat:

df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
 col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild

Use .apply with replace

df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
 col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

add a comment |

You can use Dataframe.apply with a lambda function:

print(dfA, 'n')
print(dfB)

 col1
0 cat
1 dog
2 mouse
3 child 

 col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild

Combine the dataframe with pd.concat:

df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
 col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild

Use .apply with replace

df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
 col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

add a comment |

You can use Dataframe.apply with a lambda function:

print(dfA, 'n')
print(dfB)

 col1
0 cat
1 dog
2 mouse
3 child 

 col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild

Combine the dataframe with pd.concat:

df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
 col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild

Use .apply with replace

df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
 col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

You can use Dataframe.apply with a lambda function:

print(dfA, 'n')
print(dfB)

 col1
0 cat
1 dog
2 mouse
3 child 

 col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild

Combine the dataframe with pd.concat:

df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
 col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild

Use .apply with replace

df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)

print(df_combined)
 col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

answered Mar 27 at 22:38

Erfan

11.6k2 gold badges7 silver badges28 bronze badges

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

add a comment |

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1) as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post

– Danny Moncadea
Mar 28 at 12:59

Is one of the columns you try to replace with an integer? For example is your col1 an integer? @DannyMoncadea

– Erfan
Mar 28 at 13:13

No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error

– Danny Moncadea
Mar 28 at 14:32

add a comment |

You could try the below formula:

=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

add a comment |

You could try the below formula:

=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

add a comment |

You could try the below formula:

=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

You could try the below formula:

=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

answered Mar 28 at 9:31

Error 1004

5,4522 gold badges9 silver badges25 bronze badges

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

add a comment |

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end

– Danny Moncadea
Mar 28 at 12:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2