Grouping observations by cutoff time“Large data” work flows using pandasCollapsing Data using GroupBy PandasPytables efficiently read and process thousands of groupsSplit dataframes in groups and sub-groups and store the output in a CSV fileDataFrame by especific columns in Python pandas to a JSON response?Grouping Varying Size of Lists in a TupleDetermining Optimal Group Configuration Using PandasUsing pd.cut & pd.vales_count then results as 2d arrayFinding last possible index value to satisfy filtering requirementsOrdering across columns in a dataframe based on a custom list

How were the names on the memorial stones in Avengers: Endgame chosen, out-of-universe?

Fantasy Military Arms and Armor: the Dwarven Grand Armory

Low quality postdoc application and deadline extension

Why is a pressure canner needed when canning?

Why did Boris Johnson call for new elections?

Is every coset of a group closed under taking inverses?

If I sell my PS4 game disc and buy a digital version, can I still access my saved game?

Is the interior of a Bag of Holding actually an extradimensional space?

Go for an isolated pawn

Shoes for commuting

Is there any reason to change the ISO manually?

How can I implement regular expressions on an embedded device?

Can doublestrike kill a creature with totem armor?

Is mathematics truth?

Life post thesis submission is terrifying - Help!

Bidirectional Dictionary

Is it possible to retrieve/get the query hash of a query without searching the DMOs?

What's the point of this macro?

Does an antenna tuner remove standing waves from a transmission line?

Resizing attribute form in QGIS 3

What drugs were used in England during the High Middle Ages?

Tiny image scraper for xkcd.com

What are some countries where you can be imprisoned for reading or owning a Bible?

How do I stop making people jump at home and at work?



Grouping observations by cutoff time


“Large data” work flows using pandasCollapsing Data using GroupBy PandasPytables efficiently read and process thousands of groupsSplit dataframes in groups and sub-groups and store the output in a CSV fileDataFrame by especific columns in Python pandas to a JSON response?Grouping Varying Size of Lists in a TupleDetermining Optimal Group Configuration Using PandasUsing pd.cut & pd.vales_count then results as 2d arrayFinding last possible index value to satisfy filtering requirementsOrdering across columns in a dataframe based on a custom list






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3















I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].



And my observations are as follows:



16:30:00.095 A
16:30:00.097 B
16:30:00.122 C
16:30:00.255 D
16:30:00.322 E
16:30:00.420 F
16:30:00.569 G


What I want to achieve here is to group my observations based on the cutoff times (specifically, I want to see which one of my cutoff times are able to capture the observations - i.e. first cutoff time is fast enough to catch C, but too slow for A/B). Desired output should look something like this:



cutoff observations captured

16:30:00.100 C
16:30:00.200 D E
16:30:00.350 F
16:30:00.450 G
not possible A B


I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!










share|improve this question






























    3















    I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].



    And my observations are as follows:



    16:30:00.095 A
    16:30:00.097 B
    16:30:00.122 C
    16:30:00.255 D
    16:30:00.322 E
    16:30:00.420 F
    16:30:00.569 G


    What I want to achieve here is to group my observations based on the cutoff times (specifically, I want to see which one of my cutoff times are able to capture the observations - i.e. first cutoff time is fast enough to catch C, but too slow for A/B). Desired output should look something like this:



    cutoff observations captured

    16:30:00.100 C
    16:30:00.200 D E
    16:30:00.350 F
    16:30:00.450 G
    not possible A B


    I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!










    share|improve this question


























      3












      3








      3








      I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].



      And my observations are as follows:



      16:30:00.095 A
      16:30:00.097 B
      16:30:00.122 C
      16:30:00.255 D
      16:30:00.322 E
      16:30:00.420 F
      16:30:00.569 G


      What I want to achieve here is to group my observations based on the cutoff times (specifically, I want to see which one of my cutoff times are able to capture the observations - i.e. first cutoff time is fast enough to catch C, but too slow for A/B). Desired output should look something like this:



      cutoff observations captured

      16:30:00.100 C
      16:30:00.200 D E
      16:30:00.350 F
      16:30:00.450 G
      not possible A B


      I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!










      share|improve this question














      I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].



      And my observations are as follows:



      16:30:00.095 A
      16:30:00.097 B
      16:30:00.122 C
      16:30:00.255 D
      16:30:00.322 E
      16:30:00.420 F
      16:30:00.569 G


      What I want to achieve here is to group my observations based on the cutoff times (specifically, I want to see which one of my cutoff times are able to capture the observations - i.e. first cutoff time is fast enough to catch C, but too slow for A/B). Desired output should look something like this:



      cutoff observations captured

      16:30:00.100 C
      16:30:00.200 D E
      16:30:00.350 F
      16:30:00.450 G
      not possible A B


      I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!







      python-3.x pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 28 at 3:47









      Adrian YAdrian Y

      1361 silver badge9 bronze badges




      1361 silver badge9 bronze badges

























          1 Answer
          1






          active

          oldest

          votes


















          2
















          I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:



          print (df)
          time col
          0 16:30:00.095 A
          1 16:30:00.097 B
          2 16:30:00.122 C
          3 16:30:00.255 D
          4 16:30:00.322 E
          5 16:30:00.420 F
          6 16:30:00.569 G



          df['time'] = pd.to_timedelta(df['time'].astype(str))

          L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
          v = pd.to_timedelta(L + [pd.Timedelta.max])
          df['b'] = pd.cut(df['time'], bins=v, labels = L)
          df['b'] = df['b'].cat.add_categories(['not possible'])
          df['b'] = df['b'].fillna('not possible')
          print (df)
          time col b
          0 16:30:00.095000 A not possible
          1 16:30:00.097000 B not possible
          2 16:30:00.122000 C 16:30:00.100
          3 16:30:00.255000 D 16:30:00.200
          4 16:30:00.322000 E 16:30:00.200
          5 16:30:00.420000 F 16:30:00.350
          6 16:30:00.569000 G 16:30:00.450



          df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
          print (df2)
          b col
          0 16:30:00.100 C
          1 16:30:00.200 D, E
          2 16:30:00.350 F
          3 16:30:00.450 G
          4 not possible A, B





          share|improve this answer

























          • thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

            – Adrian Y
            Apr 1 at 3:54











          • Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

            – Adrian Y
            Apr 1 at 4:07











          • @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

            – jezrael
            Apr 2 at 5:17










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389869%2fgrouping-observations-by-cutoff-time%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2
















          I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:



          print (df)
          time col
          0 16:30:00.095 A
          1 16:30:00.097 B
          2 16:30:00.122 C
          3 16:30:00.255 D
          4 16:30:00.322 E
          5 16:30:00.420 F
          6 16:30:00.569 G



          df['time'] = pd.to_timedelta(df['time'].astype(str))

          L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
          v = pd.to_timedelta(L + [pd.Timedelta.max])
          df['b'] = pd.cut(df['time'], bins=v, labels = L)
          df['b'] = df['b'].cat.add_categories(['not possible'])
          df['b'] = df['b'].fillna('not possible')
          print (df)
          time col b
          0 16:30:00.095000 A not possible
          1 16:30:00.097000 B not possible
          2 16:30:00.122000 C 16:30:00.100
          3 16:30:00.255000 D 16:30:00.200
          4 16:30:00.322000 E 16:30:00.200
          5 16:30:00.420000 F 16:30:00.350
          6 16:30:00.569000 G 16:30:00.450



          df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
          print (df2)
          b col
          0 16:30:00.100 C
          1 16:30:00.200 D, E
          2 16:30:00.350 F
          3 16:30:00.450 G
          4 not possible A, B





          share|improve this answer

























          • thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

            – Adrian Y
            Apr 1 at 3:54











          • Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

            – Adrian Y
            Apr 1 at 4:07











          • @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

            – jezrael
            Apr 2 at 5:17















          2
















          I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:



          print (df)
          time col
          0 16:30:00.095 A
          1 16:30:00.097 B
          2 16:30:00.122 C
          3 16:30:00.255 D
          4 16:30:00.322 E
          5 16:30:00.420 F
          6 16:30:00.569 G



          df['time'] = pd.to_timedelta(df['time'].astype(str))

          L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
          v = pd.to_timedelta(L + [pd.Timedelta.max])
          df['b'] = pd.cut(df['time'], bins=v, labels = L)
          df['b'] = df['b'].cat.add_categories(['not possible'])
          df['b'] = df['b'].fillna('not possible')
          print (df)
          time col b
          0 16:30:00.095000 A not possible
          1 16:30:00.097000 B not possible
          2 16:30:00.122000 C 16:30:00.100
          3 16:30:00.255000 D 16:30:00.200
          4 16:30:00.322000 E 16:30:00.200
          5 16:30:00.420000 F 16:30:00.350
          6 16:30:00.569000 G 16:30:00.450



          df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
          print (df2)
          b col
          0 16:30:00.100 C
          1 16:30:00.200 D, E
          2 16:30:00.350 F
          3 16:30:00.450 G
          4 not possible A, B





          share|improve this answer

























          • thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

            – Adrian Y
            Apr 1 at 3:54











          • Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

            – Adrian Y
            Apr 1 at 4:07











          • @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

            – jezrael
            Apr 2 at 5:17













          2














          2










          2









          I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:



          print (df)
          time col
          0 16:30:00.095 A
          1 16:30:00.097 B
          2 16:30:00.122 C
          3 16:30:00.255 D
          4 16:30:00.322 E
          5 16:30:00.420 F
          6 16:30:00.569 G



          df['time'] = pd.to_timedelta(df['time'].astype(str))

          L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
          v = pd.to_timedelta(L + [pd.Timedelta.max])
          df['b'] = pd.cut(df['time'], bins=v, labels = L)
          df['b'] = df['b'].cat.add_categories(['not possible'])
          df['b'] = df['b'].fillna('not possible')
          print (df)
          time col b
          0 16:30:00.095000 A not possible
          1 16:30:00.097000 B not possible
          2 16:30:00.122000 C 16:30:00.100
          3 16:30:00.255000 D 16:30:00.200
          4 16:30:00.322000 E 16:30:00.200
          5 16:30:00.420000 F 16:30:00.350
          6 16:30:00.569000 G 16:30:00.450



          df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
          print (df2)
          b col
          0 16:30:00.100 C
          1 16:30:00.200 D, E
          2 16:30:00.350 F
          3 16:30:00.450 G
          4 not possible A, B





          share|improve this answer













          I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:



          print (df)
          time col
          0 16:30:00.095 A
          1 16:30:00.097 B
          2 16:30:00.122 C
          3 16:30:00.255 D
          4 16:30:00.322 E
          5 16:30:00.420 F
          6 16:30:00.569 G



          df['time'] = pd.to_timedelta(df['time'].astype(str))

          L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
          v = pd.to_timedelta(L + [pd.Timedelta.max])
          df['b'] = pd.cut(df['time'], bins=v, labels = L)
          df['b'] = df['b'].cat.add_categories(['not possible'])
          df['b'] = df['b'].fillna('not possible')
          print (df)
          time col b
          0 16:30:00.095000 A not possible
          1 16:30:00.097000 B not possible
          2 16:30:00.122000 C 16:30:00.100
          3 16:30:00.255000 D 16:30:00.200
          4 16:30:00.322000 E 16:30:00.200
          5 16:30:00.420000 F 16:30:00.350
          6 16:30:00.569000 G 16:30:00.450



          df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
          print (df2)
          b col
          0 16:30:00.100 C
          1 16:30:00.200 D, E
          2 16:30:00.350 F
          3 16:30:00.450 G
          4 not possible A, B






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 at 6:31









          jezraeljezrael

          406k32 gold badges423 silver badges486 bronze badges




          406k32 gold badges423 silver badges486 bronze badges















          • thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

            – Adrian Y
            Apr 1 at 3:54











          • Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

            – Adrian Y
            Apr 1 at 4:07











          • @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

            – jezrael
            Apr 2 at 5:17

















          • thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

            – Adrian Y
            Apr 1 at 3:54











          • Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

            – Adrian Y
            Apr 1 at 4:07











          • @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

            – jezrael
            Apr 2 at 5:17
















          thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

          – Adrian Y
          Apr 1 at 3:54





          thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

          – Adrian Y
          Apr 1 at 3:54













          Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

          – Adrian Y
          Apr 1 at 4:07





          Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

          – Adrian Y
          Apr 1 at 4:07













          @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

          – jezrael
          Apr 2 at 5:17





          @AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

          – jezrael
          Apr 2 at 5:17








          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







          Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389869%2fgrouping-observations-by-cutoff-time%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

          155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해