How to get the indexes of rows which has values of x number of features same while differing one feature?Converting a Pandas GroupBy object to DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow to drop a list of rows from Pandas dataframe?Change data type of columns in PandasHow do I get the row count of a pandas DataFrame?How to select rows in pandas based on list of valuesHow to multiply each row in pandas dataframe by a different valuepandas: get the value of the index for a row?How to find an intersection of a list of dataframes with exactly same columns and indexes but different values in pandas python?
Getting a similar picture (colours) on Manual Mode while using similar Auto Mode settings (T6 and 40D)
Testing if os.path.exists with ArcPy?
Was the dragon prowess intentionally downplayed in S08E04?
Why does SSL Labs now consider CBC suites weak?
Could a space colony 1g from the sun work?
Which creature is depicted in this Xanathar's Guide illustration of a war mage?
How does Ctrl+c and Ctrl+v work?
How to check if comma list is empty?
Why do galaxies collide?
Does this "yield your space to an ally" rule my 3.5 group uses appear anywhere in the official rules?
Holding rent money for my friend which amounts to over $10k?
Will the volt, ampere, ohm or other electrical units change on May 20th, 2019?
Are there microwaves to heat baby food at Brussels airport?
Is there any deeper thematic meaning to the white horse that Arya finds in The Bells (S08E05)?
Is random forest for regression a 'true' regression?
Is there any good reason to write "it is easy to see"?
Why did the soldiers of the North disobey Jon?
Why are goodwill impairments on the statement of cash-flows of GE?
Understanding Python syntax in lists vs series
Is my test coverage up to snuff?
When did game consoles begin including FPUs?
Promotion comes with unexpected 24/7/365 on-call
Understanding Deutch's Algorithm
How to rename multiple files in a directory at the same time
How to get the indexes of rows which has values of x number of features same while differing one feature?
Converting a Pandas GroupBy object to DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow to drop a list of rows from Pandas dataframe?Change data type of columns in PandasHow do I get the row count of a pandas DataFrame?How to select rows in pandas based on list of valuesHow to multiply each row in pandas dataframe by a different valuepandas: get the value of the index for a row?How to find an intersection of a list of dataframes with exactly same columns and indexes but different values in pandas python?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Sample DataFrame:
pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])
What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.
In this case : it should return [0,2]
pandas dataframe
add a comment |
Sample DataFrame:
pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])
What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.
In this case : it should return [0,2]
pandas dataframe
What should happen when there is a 6th rowJohn Boston 23
? Do you want indices 0,2 and 5 then?
– ALollz
Mar 23 at 20:43
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46
add a comment |
Sample DataFrame:
pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])
What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.
In this case : it should return [0,2]
pandas dataframe
Sample DataFrame:
pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])
What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.
In this case : it should return [0,2]
pandas dataframe
pandas dataframe
asked Mar 23 at 15:32
Naushad ShukoorNaushad Shukoor
216
216
What should happen when there is a 6th rowJohn Boston 23
? Do you want indices 0,2 and 5 then?
– ALollz
Mar 23 at 20:43
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46
add a comment |
What should happen when there is a 6th rowJohn Boston 23
? Do you want indices 0,2 and 5 then?
– ALollz
Mar 23 at 20:43
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46
What should happen when there is a 6th row
John Boston 23
? Do you want indices 0,2 and 5 then?– ALollz
Mar 23 at 20:43
What should happen when there is a 6th row
John Boston 23
? Do you want indices 0,2 and 5 then?– ALollz
Mar 23 at 20:43
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46
add a comment |
3 Answers
3
active
oldest
votes
Try this below:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
Name City Age
0 John Boston 23
2 John Boston 21
EDIT: The scenario that @ALollz had pointed out can be acheived using:
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()
Output:
Name City Age
0 John Boston 23
2 John Boston 21
add a comment |
I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age
I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby
+ transform
+ nunique
to filter may be required.
Sample Data:
Note, the edge case I added here, where John Boston 23
is duplicated:
import pandas as pd
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
# Name City Age
#0 John Boston 23
#1 Peter Japan 31
#2 John Boston 21
#3 John Dallas 21
#4 Donald Japan 22
#5 John Boston 23
Code:
df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]
# Name City Age
#0 John Boston 23
#2 John Boston 21
#5 John Boston 23
With other solutions, the exact duplication may lead to an unwanted output:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
# Name City Age
#2 John Boston 21
add a comment |
Another method could be by using groupby():
df[df.groupby(['Name', 'City']).transform(len)['Age']>1]
or may be in two steps as using duplicated():
df =df.set_index('Age')
df[df.duplicated(['Name', 'City'], keep = False)].reset_index()
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55315382%2fhow-to-get-the-indexes-of-rows-which-has-values-of-x-number-of-features-same-whi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this below:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
Name City Age
0 John Boston 23
2 John Boston 21
EDIT: The scenario that @ALollz had pointed out can be acheived using:
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()
Output:
Name City Age
0 John Boston 23
2 John Boston 21
add a comment |
Try this below:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
Name City Age
0 John Boston 23
2 John Boston 21
EDIT: The scenario that @ALollz had pointed out can be acheived using:
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()
Output:
Name City Age
0 John Boston 23
2 John Boston 21
add a comment |
Try this below:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
Name City Age
0 John Boston 23
2 John Boston 21
EDIT: The scenario that @ALollz had pointed out can be acheived using:
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()
Output:
Name City Age
0 John Boston 23
2 John Boston 21
Try this below:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
Name City Age
0 John Boston 23
2 John Boston 21
EDIT: The scenario that @ALollz had pointed out can be acheived using:
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()
Output:
Name City Age
0 John Boston 23
2 John Boston 21
edited Mar 24 at 6:04
answered Mar 23 at 15:35
anky_91anky_91
13.1k3922
13.1k3922
add a comment |
add a comment |
I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age
I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby
+ transform
+ nunique
to filter may be required.
Sample Data:
Note, the edge case I added here, where John Boston 23
is duplicated:
import pandas as pd
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
# Name City Age
#0 John Boston 23
#1 Peter Japan 31
#2 John Boston 21
#3 John Dallas 21
#4 Donald Japan 22
#5 John Boston 23
Code:
df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]
# Name City Age
#0 John Boston 23
#2 John Boston 21
#5 John Boston 23
With other solutions, the exact duplication may lead to an unwanted output:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
# Name City Age
#2 John Boston 21
add a comment |
I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age
I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby
+ transform
+ nunique
to filter may be required.
Sample Data:
Note, the edge case I added here, where John Boston 23
is duplicated:
import pandas as pd
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
# Name City Age
#0 John Boston 23
#1 Peter Japan 31
#2 John Boston 21
#3 John Dallas 21
#4 Donald Japan 22
#5 John Boston 23
Code:
df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]
# Name City Age
#0 John Boston 23
#2 John Boston 21
#5 John Boston 23
With other solutions, the exact duplication may lead to an unwanted output:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
# Name City Age
#2 John Boston 21
add a comment |
I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age
I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby
+ transform
+ nunique
to filter may be required.
Sample Data:
Note, the edge case I added here, where John Boston 23
is duplicated:
import pandas as pd
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
# Name City Age
#0 John Boston 23
#1 Peter Japan 31
#2 John Boston 21
#3 John Dallas 21
#4 Donald Japan 22
#5 John Boston 23
Code:
df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]
# Name City Age
#0 John Boston 23
#2 John Boston 21
#5 John Boston 23
With other solutions, the exact duplication may lead to an unwanted output:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
# Name City Age
#2 John Boston 21
I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age
I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby
+ transform
+ nunique
to filter may be required.
Sample Data:
Note, the edge case I added here, where John Boston 23
is duplicated:
import pandas as pd
df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
# Name City Age
#0 John Boston 23
#1 Peter Japan 31
#2 John Boston 21
#3 John Dallas 21
#4 Donald Japan 22
#5 John Boston 23
Code:
df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]
# Name City Age
#0 John Boston 23
#2 John Boston 21
#5 John Boston 23
With other solutions, the exact duplication may lead to an unwanted output:
df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
# Name City Age
#2 John Boston 21
edited Mar 23 at 20:42
answered Mar 23 at 20:35
ALollzALollz
18.4k51840
18.4k51840
add a comment |
add a comment |
Another method could be by using groupby():
df[df.groupby(['Name', 'City']).transform(len)['Age']>1]
or may be in two steps as using duplicated():
df =df.set_index('Age')
df[df.duplicated(['Name', 'City'], keep = False)].reset_index()
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
add a comment |
Another method could be by using groupby():
df[df.groupby(['Name', 'City']).transform(len)['Age']>1]
or may be in two steps as using duplicated():
df =df.set_index('Age')
df[df.duplicated(['Name', 'City'], keep = False)].reset_index()
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
add a comment |
Another method could be by using groupby():
df[df.groupby(['Name', 'City']).transform(len)['Age']>1]
or may be in two steps as using duplicated():
df =df.set_index('Age')
df[df.duplicated(['Name', 'City'], keep = False)].reset_index()
Another method could be by using groupby():
df[df.groupby(['Name', 'City']).transform(len)['Age']>1]
or may be in two steps as using duplicated():
df =df.set_index('Age')
df[df.duplicated(['Name', 'City'], keep = False)].reset_index()
edited Mar 23 at 16:27
answered Mar 23 at 16:18
LoochieLoochie
984311
984311
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
add a comment |
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns
– Naushad Shukoor
Mar 23 at 17:18
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55315382%2fhow-to-get-the-indexes-of-rows-which-has-values-of-x-number-of-features-same-whi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What should happen when there is a 6th row
John Boston 23
? Do you want indices 0,2 and 5 then?– ALollz
Mar 23 at 20:43
Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.
– Naushad Shukoor
Mar 25 at 10:46