Finding Difference Between Strings in Two Dataframes/Lists, Output DifferenceWhat is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers
Idiomatic way to create an immutable and efficient class in C++?
Are spot colors limited and why CMYK mix is not treated same as spot color mix?
Is this homebrew "Faerie Fire Grenade" unbalanced?
Are sweatpants frowned upon on flights?
Why is the Ellipsoid Method of polynomial complexity?
Why are JWST optics not enclosed like HST?
Why did Starhopper's exhaust plume become brighter just before landing?
Pen test results for web application include a file from a forbidden directory that is not even used or referenced
is "prohibition against," a double negative?
Is this position a forced win for Black after move 14?
Get contents before a colon
I feel cheated by my new employer, does this sound right?
Why do IR remotes influence AM radios?
Why military weather satellites?
Is it possible for a person to be tricked into becoming a lich?
Can I lend a small amount of my own money to a bank at the federal funds rate?
Do universities maintain secret textbooks?
Why didn't Doc believe Marty was from the future?
What is the practical impact of using System.Random which is not cryptographically random?
Create a list of snaking numbers under 50,000
Which polygons can be turned inside out by a smooth deformation?
“I hope he visit us more often” Why is this wrong?
What is this "opened" cube called?
Group riding etiquette
Finding Difference Between Strings in Two Dataframes/Lists, Output Difference
What is the difference between @staticmethod and @classmethod?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Python join: why is it string.join(list) instead of list.join(string)?What's the difference between lists and tuples?Difference between __str__ and __repr__?How do I concatenate two lists in Python?Get difference between two lists“Large data” work flows using pandasGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:
key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild
I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:
ABdifference
big
small
huge
normal
I've looked into using the difflib library however I don't think it'll produce the results in a readable format
I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...
import pandas as pd
from pandas import ExcelWriter
import difflib
df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')
first= df['A']
second = df['B']
i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.
If anyone could assist in anyway it would be hugely appreciated!
Cheers
python excel pandas
add a comment |
So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:
key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild
I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:
ABdifference
big
small
huge
normal
I've looked into using the difflib library however I don't think it'll produce the results in a readable format
I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...
import pandas as pd
from pandas import ExcelWriter
import difflib
df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')
first= df['A']
second = df['B']
i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.
If anyone could assist in anyway it would be hugely appreciated!
Cheers
python excel pandas
You should post the rest of the columns as well, we can check if there arekeycolumns
which we can use to simply combine the data.
– Erfan
Mar 27 at 22:29
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05
add a comment |
So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:
key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild
I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:
ABdifference
big
small
huge
normal
I've looked into using the difflib library however I don't think it'll produce the results in a readable format
I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...
import pandas as pd
from pandas import ExcelWriter
import difflib
df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')
first= df['A']
second = df['B']
i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.
If anyone could assist in anyway it would be hugely appreciated!
Cheers
python excel pandas
So I have an excel sheet and I am trying to analyse the difference between two versions.
Specifically, I have two columns; A and B.
I imported into python and using pandas I made both A and B their own dataframes (respectively called dfA and dfB). Here they have the following:
key dfA dfB
1 cat bigcat
2 dog smalldog
3 mouse hugemouse
4 child normalchild
I am trying to output a third column containing the string difference between the two dataframes, so essentially a third dataframe/column:
ABdifference
big
small
huge
normal
I've looked into using the difflib library however I don't think it'll produce the results in a readable format
I'll paste the code of what I have so far, but it's really not much as I haven't coded in some time and I thought it'd be easier than I thought...
import pandas as pd
from pandas import ExcelWriter
import difflib
df = pd.read_excel('somesheet.xlsx', sheet_name='Diff')
first= df['A']
second = df['B']
i'm not married to the idea of using pandas and dataframes, i just assumed it was the best way to go about excel data.
If anyone could assist in anyway it would be hugely appreciated!
Cheers
python excel pandas
python excel pandas
edited Mar 28 at 13:05
Danny Moncadea
asked Mar 27 at 22:11
Danny MoncadeaDanny Moncadea
204 bronze badges
204 bronze badges
You should post the rest of the columns as well, we can check if there arekeycolumns
which we can use to simply combine the data.
– Erfan
Mar 27 at 22:29
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05
add a comment |
You should post the rest of the columns as well, we can check if there arekeycolumns
which we can use to simply combine the data.
– Erfan
Mar 27 at 22:29
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05
You should post the rest of the columns as well, we can check if there are
keycolumns
which we can use to simply combine the data.– Erfan
Mar 27 at 22:29
You should post the rest of the columns as well, we can check if there are
keycolumns
which we can use to simply combine the data.– Erfan
Mar 27 at 22:29
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05
add a comment |
2 Answers
2
active
oldest
votes
You can use Dataframe.apply
with a lambda
function:
print(dfA, 'n')
print(dfB)
col1
0 cat
1 dog
2 mouse
3 child
col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild
Combine the dataframe with pd.concat
:
df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild
Use .apply
with replace
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points todf_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post
– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with aninteger
? For example is yourcol1
an integer? @DannyMoncadea
– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
add a comment |
You could try the below formula:
=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387244%2ffinding-difference-between-strings-in-two-dataframes-lists-output-difference%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use Dataframe.apply
with a lambda
function:
print(dfA, 'n')
print(dfB)
col1
0 cat
1 dog
2 mouse
3 child
col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild
Combine the dataframe with pd.concat
:
df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild
Use .apply
with replace
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points todf_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post
– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with aninteger
? For example is yourcol1
an integer? @DannyMoncadea
– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
add a comment |
You can use Dataframe.apply
with a lambda
function:
print(dfA, 'n')
print(dfB)
col1
0 cat
1 dog
2 mouse
3 child
col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild
Combine the dataframe with pd.concat
:
df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild
Use .apply
with replace
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points todf_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post
– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with aninteger
? For example is yourcol1
an integer? @DannyMoncadea
– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
add a comment |
You can use Dataframe.apply
with a lambda
function:
print(dfA, 'n')
print(dfB)
col1
0 cat
1 dog
2 mouse
3 child
col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild
Combine the dataframe with pd.concat
:
df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild
Use .apply
with replace
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal
You can use Dataframe.apply
with a lambda
function:
print(dfA, 'n')
print(dfB)
col1
0 cat
1 dog
2 mouse
3 child
col2
0 bigcat
1 smalldog
2 hugemouse
3 normalchild
Combine the dataframe with pd.concat
:
df_combined = pd.concat([dfA, dfB], axis=1)
print(df_combined)
col1 col2
0 cat bigcat
1 dog smalldog
2 mouse hugemouse
3 child normalchild
Use .apply
with replace
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
print(df_combined)
col1 col2 col
0 cat bigcat big
1 dog smalldog small
2 mouse hugemouse huge
3 child normalchild normal
answered Mar 27 at 22:38
ErfanErfan
11.6k2 gold badges7 silver badges28 bronze badges
11.6k2 gold badges7 silver badges28 bronze badges
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points todf_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post
– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with aninteger
? For example is yourcol1
an integer? @DannyMoncadea
– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
add a comment |
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points todf_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post
– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with aninteger
? For example is yourcol1
an integer? @DannyMoncadea
– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post– Danny Moncadea
Mar 28 at 12:59
hey, thanks for the reply! unfortunately, i'm getting a TypeError when applying the lambda function: TypeError: 'str' object cannot be interpreted as an integer. The debug menu points to
df_combined['col'] = df_combined.apply(lambda x: x['col2'].replace(x['col1'], ''), axis=1)
as the source of the error.. As for a key, I do have keys associated with both versions that i'm comparing. I'll update my original post– Danny Moncadea
Mar 28 at 12:59
Is one of the columns you try to replace with an
integer
? For example is your col1
an integer? @DannyMoncadea– Erfan
Mar 28 at 13:13
Is one of the columns you try to replace with an
integer
? For example is your col1
an integer? @DannyMoncadea– Erfan
Mar 28 at 13:13
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
No, they're not integers they're all strings. In fact I tried it once with my data set and once again with the sample data i posted and it brought up the same error
– Danny Moncadea
Mar 28 at 14:32
add a comment |
You could try the below formula:
=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
add a comment |
You could try the below formula:
=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
add a comment |
You could try the below formula:
=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))
You could try the below formula:
=IF(FIND(A2,B2)>1,LEFT(B2,FIND(A2,B2)-1),IF(FIND(B2,B2)=1,RIGHT(B2,LEN(B2)-LEN(A2))))
answered Mar 28 at 9:31
Error 1004Error 1004
5,4522 gold badges9 silver badges25 bronze badges
5,4522 gold badges9 silver badges25 bronze badges
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
add a comment |
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
unfortunately that doesn't seem to work :( so my data set isn't strictly a string with added characters at the end, and this function doesn't seem to account for that and only looks at length from the end
– Danny Moncadea
Mar 28 at 12:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387244%2ffinding-difference-between-strings-in-two-dataframes-lists-output-difference%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You should post the rest of the columns as well, we can check if there are
keycolumns
which we can use to simply combine the data.– Erfan
Mar 27 at 22:29
can't really post the raw data as it's sensitive (and spans 8000 rows and 40 columns) but i'll add an arbitrary key to my post
– Danny Moncadea
Mar 28 at 13:05