Grouping observations by cutoff time“Large data” work flows using pandasCollapsing Data using GroupBy PandasPytables efficiently read and process thousands of groupsSplit dataframes in groups and sub-groups and store the output in a CSV fileDataFrame by especific columns in Python pandas to a JSON response?Grouping Varying Size of Lists in a TupleDetermining Optimal Group Configuration Using PandasUsing pd.cut & pd.vales_count then results as 2d arrayFinding last possible index value to satisfy filtering requirementsOrdering across columns in a dataframe based on a custom list

How were the names on the memorial stones in Avengers: Endgame chosen, out-of-universe?

Fantasy Military Arms and Armor: the Dwarven Grand Armory

Low quality postdoc application and deadline extension

Why is a pressure canner needed when canning?

Why did Boris Johnson call for new elections?

Is every coset of a group closed under taking inverses?

If I sell my PS4 game disc and buy a digital version, can I still access my saved game?

Is the interior of a Bag of Holding actually an extradimensional space?

Go for an isolated pawn

Shoes for commuting

Is there any reason to change the ISO manually?

How can I implement regular expressions on an embedded device?

Can doublestrike kill a creature with totem armor?

Is mathematics truth?

Life post thesis submission is terrifying - Help!

Bidirectional Dictionary

Is it possible to retrieve/get the query hash of a query without searching the DMOs?

What's the point of this macro?

Does an antenna tuner remove standing waves from a transmission line?

Resizing attribute form in QGIS 3

What drugs were used in England during the High Middle Ages?

Tiny image scraper for xkcd.com

What are some countries where you can be imprisoned for reading or owning a Bible?

How do I stop making people jump at home and at work?

Grouping observations by cutoff time

“Large data” work flows using pandasCollapsing Data using GroupBy PandasPytables efficiently read and process thousands of groupsSplit dataframes in groups and sub-groups and store the output in a CSV fileDataFrame by especific columns in Python pandas to a JSON response?Grouping Varying Size of Lists in a TupleDetermining Optimal Group Configuration Using PandasUsing pd.cut & pd.vales_count then results as 2d arrayFinding last possible index value to satisfy filtering requirementsOrdering across columns in a dataframe based on a custom list

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].

And my observations are as follows:

16:30:00.095 A
16:30:00.097 B
16:30:00.122 C
16:30:00.255 D
16:30:00.322 E
16:30:00.420 F
16:30:00.569 G

What I want to achieve here is to group my observations based on the cutoff times (specifically, I want to see which one of my cutoff times are able to capture the observations - i.e. first cutoff time is fast enough to catch C, but too slow for A/B). Desired output should look something like this:

cutoff observations captured

16:30:00.100 C
16:30:00.200 D E
16:30:00.350 F
16:30:00.450 G
not possible A B

I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

add a comment |

I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].

And my observations are as follows:

16:30:00.095 A
16:30:00.097 B
16:30:00.122 C
16:30:00.255 D
16:30:00.322 E
16:30:00.420 F
16:30:00.569 G

cutoff observations captured

16:30:00.100 C
16:30:00.200 D E
16:30:00.350 F
16:30:00.450 G
not possible A B

I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

add a comment |

I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].

And my observations are as follows:

16:30:00.095 A
16:30:00.097 B
16:30:00.122 C
16:30:00.255 D
16:30:00.322 E
16:30:00.420 F
16:30:00.569 G

cutoff observations captured

16:30:00.100 C
16:30:00.200 D E
16:30:00.350 F
16:30:00.450 G
not possible A B

I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

I have a list of cutoff times list = [16:30:00.100, 16:30:00.200, 16:30:00.350, 16:30:00.450].

And my observations are as follows:

16:30:00.095 A
16:30:00.097 B
16:30:00.122 C
16:30:00.255 D
16:30:00.322 E
16:30:00.420 F
16:30:00.569 G

cutoff observations captured

16:30:00.100 C
16:30:00.200 D E
16:30:00.350 F
16:30:00.450 G
not possible A B

I have tried using pd.cut, but it doesn't allow for time sensitivity up to the milliseconds, or at least not that I am aware of. Any help will be greatly appreciated. Thanks!

python-3.x pandas

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

asked Mar 28 at 3:47

Adrian Y

1361 silver badge9 bronze badges

add a comment |

1 Answer
1

active

oldest

votes

I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:

print (df)
 time col
0 16:30:00.095 A
1 16:30:00.097 B
2 16:30:00.122 C
3 16:30:00.255 D
4 16:30:00.322 E
5 16:30:00.420 F
6 16:30:00.569 G

df['time'] = pd.to_timedelta(df['time'].astype(str))

L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
v = pd.to_timedelta(L + [pd.Timedelta.max])
df['b'] = pd.cut(df['time'], bins=v, labels = L)
df['b'] = df['b'].cat.add_categories(['not possible'])
df['b'] = df['b'].fillna('not possible')
print (df)
 time col b
0 16:30:00.095000 A not possible
1 16:30:00.097000 B not possible
2 16:30:00.122000 C 16:30:00.100
3 16:30:00.255000 D 16:30:00.200
4 16:30:00.322000 E 16:30:00.200
5 16:30:00.420000 F 16:30:00.350
6 16:30:00.569000 G 16:30:00.450

df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
print (df2)
 b col
0 16:30:00.100 C
1 16:30:00.200 D, E
2 16:30:00.350 F
3 16:30:00.450 G
4 not possible A, B

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55389869%2fgrouping-observations-by-cutoff-time%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:

print (df)
 time col
0 16:30:00.095 A
1 16:30:00.097 B
2 16:30:00.122 C
3 16:30:00.255 D
4 16:30:00.322 E
5 16:30:00.420 F
6 16:30:00.569 G

df['time'] = pd.to_timedelta(df['time'].astype(str))

L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
v = pd.to_timedelta(L + [pd.Timedelta.max])
df['b'] = pd.cut(df['time'], bins=v, labels = L)
df['b'] = df['b'].cat.add_categories(['not possible'])
df['b'] = df['b'].fillna('not possible')
print (df)
 time col b
0 16:30:00.095000 A not possible
1 16:30:00.097000 B not possible
2 16:30:00.122000 C 16:30:00.100
3 16:30:00.255000 D 16:30:00.200
4 16:30:00.322000 E 16:30:00.200
5 16:30:00.420000 F 16:30:00.350
6 16:30:00.569000 G 16:30:00.450

df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
print (df2)
 b col
0 16:30:00.100 C
1 16:30:00.200 D, E
2 16:30:00.350 F
3 16:30:00.450 G
4 not possible A, B

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

add a comment |

I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:

print (df)
 time col
0 16:30:00.095 A
1 16:30:00.097 B
2 16:30:00.122 C
3 16:30:00.255 D
4 16:30:00.322 E
5 16:30:00.420 F
6 16:30:00.569 G

df['time'] = pd.to_timedelta(df['time'].astype(str))

L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
v = pd.to_timedelta(L + [pd.Timedelta.max])
df['b'] = pd.cut(df['time'], bins=v, labels = L)
df['b'] = df['b'].cat.add_categories(['not possible'])
df['b'] = df['b'].fillna('not possible')
print (df)
 time col b
0 16:30:00.095000 A not possible
1 16:30:00.097000 B not possible
2 16:30:00.122000 C 16:30:00.100
3 16:30:00.255000 D 16:30:00.200
4 16:30:00.322000 E 16:30:00.200
5 16:30:00.420000 F 16:30:00.350
6 16:30:00.569000 G 16:30:00.450

df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
print (df2)
 b col
0 16:30:00.100 C
1 16:30:00.200 D, E
2 16:30:00.350 F
3 16:30:00.450 G
4 not possible A, B

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

add a comment |

I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:

print (df)
 time col
0 16:30:00.095 A
1 16:30:00.097 B
2 16:30:00.122 C
3 16:30:00.255 D
4 16:30:00.322 E
5 16:30:00.420 F
6 16:30:00.569 G

df['time'] = pd.to_timedelta(df['time'].astype(str))

L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
v = pd.to_timedelta(L + [pd.Timedelta.max])
df['b'] = pd.cut(df['time'], bins=v, labels = L)
df['b'] = df['b'].cat.add_categories(['not possible'])
df['b'] = df['b'].fillna('not possible')
print (df)
 time col b
0 16:30:00.095000 A not possible
1 16:30:00.097000 B not possible
2 16:30:00.122000 C 16:30:00.100
3 16:30:00.255000 D 16:30:00.200
4 16:30:00.322000 E 16:30:00.200
5 16:30:00.420000 F 16:30:00.350
6 16:30:00.569000 G 16:30:00.450

df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
print (df2)
 b col
0 16:30:00.100 C
1 16:30:00.200 D, E
2 16:30:00.350 F
3 16:30:00.450 G
4 not possible A, B

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

I think idea with cut working nice, also time data are converted to timedeltas by to_timedelta, replace non matching values by fillna and last aggregate join:

print (df)
 time col
0 16:30:00.095 A
1 16:30:00.097 B
2 16:30:00.122 C
3 16:30:00.255 D
4 16:30:00.322 E
5 16:30:00.420 F
6 16:30:00.569 G

df['time'] = pd.to_timedelta(df['time'].astype(str))

L = ['16:30:00.100', '16:30:00.200', '16:30:00.350', '16:30:00.450']
v = pd.to_timedelta(L + [pd.Timedelta.max])
df['b'] = pd.cut(df['time'], bins=v, labels = L)
df['b'] = df['b'].cat.add_categories(['not possible'])
df['b'] = df['b'].fillna('not possible')
print (df)
 time col b
0 16:30:00.095000 A not possible
1 16:30:00.097000 B not possible
2 16:30:00.122000 C 16:30:00.100
3 16:30:00.255000 D 16:30:00.200
4 16:30:00.322000 E 16:30:00.200
5 16:30:00.420000 F 16:30:00.350
6 16:30:00.569000 G 16:30:00.450

df2 = df.groupby('b')['col'].apply(', '.join).reset_index()
print (df2)
 b col
0 16:30:00.100 C
1 16:30:00.200 D, E
2 16:30:00.350 F
3 16:30:00.450 G
4 not possible A, B

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

answered Mar 28 at 6:31

jezrael

406k32 gold badges423 silver badges486 bronze badges

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

add a comment |

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

thanks for the help! one further thing though - if L contains duplicates, is it possible for me to append col to the last instance of duplicates, instead of using duplicates='drop'?

– Adrian Y
Apr 1 at 3:54

Also, is it possible to append a new element in col on the next cell, instead of it being separated by ,

– Adrian Y
Apr 1 at 4:07

@AdrianY - Unfortunately L cannon contains duplicates for pd.cut, for next cell do you think omit df2 = df.groupby('b')['col'].apply(', '.join).reset_index() ?

– jezrael
Apr 2 at 5:17

add a comment |

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1