How to get the indexes of rows which has values of x number of features same while differing one feature?Converting a Pandas GroupBy object to DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow to drop a list of rows from Pandas dataframe?Change data type of columns in PandasHow do I get the row count of a pandas DataFrame?How to select rows in pandas based on list of valuesHow to multiply each row in pandas dataframe by a different valuepandas: get the value of the index for a row?How to find an intersection of a list of dataframes with exactly same columns and indexes but different values in pandas python?

Getting a similar picture (colours) on Manual Mode while using similar Auto Mode settings (T6 and 40D)

Testing if os.path.exists with ArcPy?

Was the dragon prowess intentionally downplayed in S08E04?

Why does SSL Labs now consider CBC suites weak?

Could a space colony 1g from the sun work?

Which creature is depicted in this Xanathar's Guide illustration of a war mage?

How does Ctrl+c and Ctrl+v work?

How to check if comma list is empty?

Why do galaxies collide?

Does this "yield your space to an ally" rule my 3.5 group uses appear anywhere in the official rules?

Holding rent money for my friend which amounts to over $10k?

Will the volt, ampere, ohm or other electrical units change on May 20th, 2019?

Are there microwaves to heat baby food at Brussels airport?

Is there any deeper thematic meaning to the white horse that Arya finds in The Bells (S08E05)?

Is random forest for regression a 'true' regression?

Is there any good reason to write "it is easy to see"?

Why did the soldiers of the North disobey Jon?

Why are goodwill impairments on the statement of cash-flows of GE?

Understanding Python syntax in lists vs series

Is my test coverage up to snuff?

When did game consoles begin including FPUs?

Promotion comes with unexpected 24/7/365 on-call

Understanding Deutch's Algorithm

How to rename multiple files in a directory at the same time



How to get the indexes of rows which has values of x number of features same while differing one feature?


Converting a Pandas GroupBy object to DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasHow to drop a list of rows from Pandas dataframe?Change data type of columns in PandasHow do I get the row count of a pandas DataFrame?How to select rows in pandas based on list of valuesHow to multiply each row in pandas dataframe by a different valuepandas: get the value of the index for a row?How to find an intersection of a list of dataframes with exactly same columns and indexes but different values in pandas python?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















Sample DataFrame:



pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])


dataframe



What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.

In this case : it should return [0,2]










share|improve this question






















  • What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

    – ALollz
    Mar 23 at 20:43











  • Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

    – Naushad Shukoor
    Mar 25 at 10:46

















1















Sample DataFrame:



pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])


dataframe



What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.

In this case : it should return [0,2]










share|improve this question






















  • What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

    – ALollz
    Mar 23 at 20:43











  • Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

    – Naushad Shukoor
    Mar 25 at 10:46













1












1








1


1






Sample DataFrame:



pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])


dataframe



What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.

In this case : it should return [0,2]










share|improve this question














Sample DataFrame:



pd.DataFrame('Name':['John','Peter','John','John','Donald'],
'City':['Boston','Japan','Boston','Dallas','Japan'],
'Age':[23,31,21,21,22])


dataframe



What i want is to get list of indices of all the rows which has same 'Name' and 'City' but different age, using pandas.

In this case : it should return [0,2]







pandas dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 23 at 15:32









Naushad ShukoorNaushad Shukoor

216




216












  • What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

    – ALollz
    Mar 23 at 20:43











  • Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

    – Naushad Shukoor
    Mar 25 at 10:46

















  • What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

    – ALollz
    Mar 23 at 20:43











  • Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

    – Naushad Shukoor
    Mar 25 at 10:46
















What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

– ALollz
Mar 23 at 20:43





What should happen when there is a 6th row John Boston 23? Do you want indices 0,2 and 5 then?

– ALollz
Mar 23 at 20:43













Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

– Naushad Shukoor
Mar 25 at 10:46





Okay...i hate to break it now, but i'm removing all the duplicates(all values including Age) beforehand. So, the above case would'nt happen at all.

– Naushad Shukoor
Mar 25 at 10:46












3 Answers
3






active

oldest

votes


















3














Try this below:



df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]

Name City Age
0 John Boston 23
2 John Boston 21


EDIT: The scenario that @ALollz had pointed out can be acheived using:



df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
'Age':[23,31,21,21,22, 23])
df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()


Output:



 Name City Age
0 John Boston 23
2 John Boston 21





share|improve this answer
































    1















    I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age




    I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby + transform + nunique to filter may be required.



    Sample Data:



    Note, the edge case I added here, where John Boston 23 is duplicated:



    import pandas as pd
    df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
    'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
    'Age':[23,31,21,21,22, 23])

    # Name City Age
    #0 John Boston 23
    #1 Peter Japan 31
    #2 John Boston 21
    #3 John Dallas 21
    #4 Donald Japan 22
    #5 John Boston 23


    Code:



    df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]

    # Name City Age
    #0 John Boston 23
    #2 John Boston 21
    #5 John Boston 23



    With other solutions, the exact duplication may lead to an unwanted output:



    df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
    # Name City Age
    #2 John Boston 21





    share|improve this answer
































      0














      Another method could be by using groupby():



      df[df.groupby(['Name', 'City']).transform(len)['Age']>1]


      or may be in two steps as using duplicated():



      df =df.set_index('Age')
      df[df.duplicated(['Name', 'City'], keep = False)].reset_index()





      share|improve this answer

























      • this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

        – Naushad Shukoor
        Mar 23 at 17:18











      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55315382%2fhow-to-get-the-indexes-of-rows-which-has-values-of-x-number-of-features-same-whi%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3














      Try this below:



      df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]

      Name City Age
      0 John Boston 23
      2 John Boston 21


      EDIT: The scenario that @ALollz had pointed out can be acheived using:



      df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
      'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
      'Age':[23,31,21,21,22, 23])
      df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()


      Output:



       Name City Age
      0 John Boston 23
      2 John Boston 21





      share|improve this answer





























        3














        Try this below:



        df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]

        Name City Age
        0 John Boston 23
        2 John Boston 21


        EDIT: The scenario that @ALollz had pointed out can be acheived using:



        df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
        'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
        'Age':[23,31,21,21,22, 23])
        df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()


        Output:



         Name City Age
        0 John Boston 23
        2 John Boston 21





        share|improve this answer



























          3












          3








          3







          Try this below:



          df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]

          Name City Age
          0 John Boston 23
          2 John Boston 21


          EDIT: The scenario that @ALollz had pointed out can be acheived using:



          df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
          'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
          'Age':[23,31,21,21,22, 23])
          df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()


          Output:



           Name City Age
          0 John Boston 23
          2 John Boston 21





          share|improve this answer















          Try this below:



          df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]

          Name City Age
          0 John Boston 23
          2 John Boston 21


          EDIT: The scenario that @ALollz had pointed out can be acheived using:



          df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
          'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
          'Age':[23,31,21,21,22, 23])
          df[df.duplicated(['Name','City'],keep=False)].drop_duplicates()


          Output:



           Name City Age
          0 John Boston 23
          2 John Boston 21






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 24 at 6:04

























          answered Mar 23 at 15:35









          anky_91anky_91

          13.1k3922




          13.1k3922























              1















              I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age




              I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby + transform + nunique to filter may be required.



              Sample Data:



              Note, the edge case I added here, where John Boston 23 is duplicated:



              import pandas as pd
              df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
              'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
              'Age':[23,31,21,21,22, 23])

              # Name City Age
              #0 John Boston 23
              #1 Peter Japan 31
              #2 John Boston 21
              #3 John Dallas 21
              #4 Donald Japan 22
              #5 John Boston 23


              Code:



              df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]

              # Name City Age
              #0 John Boston 23
              #2 John Boston 21
              #5 John Boston 23



              With other solutions, the exact duplication may lead to an unwanted output:



              df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
              # Name City Age
              #2 John Boston 21





              share|improve this answer





























                1















                I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age




                I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby + transform + nunique to filter may be required.



                Sample Data:



                Note, the edge case I added here, where John Boston 23 is duplicated:



                import pandas as pd
                df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
                'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
                'Age':[23,31,21,21,22, 23])

                # Name City Age
                #0 John Boston 23
                #1 Peter Japan 31
                #2 John Boston 21
                #3 John Dallas 21
                #4 Donald Japan 22
                #5 John Boston 23


                Code:



                df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]

                # Name City Age
                #0 John Boston 23
                #2 John Boston 21
                #5 John Boston 23



                With other solutions, the exact duplication may lead to an unwanted output:



                df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
                # Name City Age
                #2 John Boston 21





                share|improve this answer



























                  1












                  1








                  1








                  I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age




                  I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby + transform + nunique to filter may be required.



                  Sample Data:



                  Note, the edge case I added here, where John Boston 23 is duplicated:



                  import pandas as pd
                  df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
                  'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
                  'Age':[23,31,21,21,22, 23])

                  # Name City Age
                  #0 John Boston 23
                  #1 Peter Japan 31
                  #2 John Boston 21
                  #3 John Dallas 21
                  #4 Donald Japan 22
                  #5 John Boston 23


                  Code:



                  df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]

                  # Name City Age
                  #0 John Boston 23
                  #2 John Boston 21
                  #5 John Boston 23



                  With other solutions, the exact duplication may lead to an unwanted output:



                  df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
                  # Name City Age
                  #2 John Boston 21





                  share|improve this answer
















                  I want is to get list of indices of all the rows which has same 'Name' and 'City' but different age




                  I think this is a bit ambiguous, because what if a Name-City group has a combination of entries with the same age and some that differ? Depending upon your desired output groupby + transform + nunique to filter may be required.



                  Sample Data:



                  Note, the edge case I added here, where John Boston 23 is duplicated:



                  import pandas as pd
                  df = pd.DataFrame('Name':['John','Peter','John','John','Donald', 'John'],
                  'City':['Boston','Japan','Boston','Dallas','Japan', 'Boston'],
                  'Age':[23,31,21,21,22, 23])

                  # Name City Age
                  #0 John Boston 23
                  #1 Peter Japan 31
                  #2 John Boston 21
                  #3 John Dallas 21
                  #4 Donald Japan 22
                  #5 John Boston 23


                  Code:



                  df[df.groupby(['Name', 'City']).Age.transform(pd.Series.nunique).gt(1)]

                  # Name City Age
                  #0 John Boston 23
                  #2 John Boston 21
                  #5 John Boston 23



                  With other solutions, the exact duplication may lead to an unwanted output:



                  df[df.duplicated(['Name','City'],keep=False)&~df.duplicated(keep=False)]
                  # Name City Age
                  #2 John Boston 21






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 23 at 20:42

























                  answered Mar 23 at 20:35









                  ALollzALollz

                  18.4k51840




                  18.4k51840





















                      0














                      Another method could be by using groupby():



                      df[df.groupby(['Name', 'City']).transform(len)['Age']>1]


                      or may be in two steps as using duplicated():



                      df =df.set_index('Age')
                      df[df.duplicated(['Name', 'City'], keep = False)].reset_index()





                      share|improve this answer

























                      • this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                        – Naushad Shukoor
                        Mar 23 at 17:18















                      0














                      Another method could be by using groupby():



                      df[df.groupby(['Name', 'City']).transform(len)['Age']>1]


                      or may be in two steps as using duplicated():



                      df =df.set_index('Age')
                      df[df.duplicated(['Name', 'City'], keep = False)].reset_index()





                      share|improve this answer

























                      • this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                        – Naushad Shukoor
                        Mar 23 at 17:18













                      0












                      0








                      0







                      Another method could be by using groupby():



                      df[df.groupby(['Name', 'City']).transform(len)['Age']>1]


                      or may be in two steps as using duplicated():



                      df =df.set_index('Age')
                      df[df.duplicated(['Name', 'City'], keep = False)].reset_index()





                      share|improve this answer















                      Another method could be by using groupby():



                      df[df.groupby(['Name', 'City']).transform(len)['Age']>1]


                      or may be in two steps as using duplicated():



                      df =df.set_index('Age')
                      df[df.duplicated(['Name', 'City'], keep = False)].reset_index()






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Mar 23 at 16:27

























                      answered Mar 23 at 16:18









                      LoochieLoochie

                      984311




                      984311












                      • this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                        – Naushad Shukoor
                        Mar 23 at 17:18

















                      • this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                        – Naushad Shukoor
                        Mar 23 at 17:18
















                      this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                      – Naushad Shukoor
                      Mar 23 at 17:18





                      this doesn't give the desired results. also, i'm skeptical on how the groupby would fare on >~300 columns

                      – Naushad Shukoor
                      Mar 23 at 17:18

















                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55315382%2fhow-to-get-the-indexes-of-rows-which-has-values-of-x-number-of-features-same-whi%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                      Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                      Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript