Aggregate a tibble based on a consecutive values in a boolean column Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Using dplyr::mutate between two dataframes to create column based on date rangeFrequency of multiple boolean or non-boolean columns in Rcumsum by day with hourly observations with dplyrcombining rows with dplyr when number of columns is unknowntransform() to add rows with dplyr()Substract value from tibble column based on another tibbleHow to update the row value with sequentially updated result of previous row of same column in RAggregate a tibble based on date windowsNested `if_else()` and `is.na()` logic inconsistency?row wise test if multiple (not all) columns are equal

Illegal assignment from sObject to Id

Significance of Cersei's obsession with elephants?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

How to react to hostile behavior from a senior developer?

Would the Life Transference spell be unbalanced if it ignored resistance and immunity?

How come Sam didn't become Lord of Horn Hill?

Did Deadpool rescue all of the X-Force?

What is the topology associated with the algebras for the ultrafilter monad?

Sum letters are not two different

What is a fractional matching?

Do I really need to have a message in a novel to appeal to readers?

What is the difference between globalisation and imperialism?

Most bit efficient text communication method?

Why wasn't DOSKEY integrated with COMMAND.COM?

Can the Great Weapon Master feat's damage bonus and accuracy penalty apply to attacks from the Spiritual Weapon spell?

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

Did Krishna say in Bhagavad Gita "I am in every living being"

How to write the following sign?

Time to Settle Down!

If Windows 7 doesn't support WSL, then what does Linux subsystem option mean?

Converted a Scalar function to a TVF function for parallel execution-Still running in Serial mode

Amount of permutations on an NxNxN Rubik's Cube

Is there any word for a place full of confusion?

What was the first language to use conditional keywords?

Aggregate a tibble based on a consecutive values in a boolean column

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)

Data science time! April 2019 and salary with experience

The Ask Question Wizard is Live!Using dplyr::mutate between two dataframes to create column based on date rangeFrequency of multiple boolean or non-boolean columns in Rcumsum by day with hourly observations with dplyrcombining rows with dplyr when number of columns is unknowntransform() to add rows with dplyr()Substract value from tibble column based on another tibbleHow to update the row value with sequentially updated result of previous row of same column in RAggregate a tibble based on date windowsNested `if_else()` and `is.na()` logic inconsistency?row wise test if multiple (not all) columns are equal

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.

I've got a summary table, df, for an hourly timeseries dataset where each observations belongs to a group.
I want to merge some of those groups, based on a boolean column in the summary table.
The boolean column, merge_with_next indicates whether a given group should be merged with the next group (one row down).
The merging effectively occurs by updating the end, value and removing rows:

library(dplyr)

# Demo data
df <- tibble(
 group = 1:12,
 start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
 end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"), 
 merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
)

df
#> # A tibble: 12 x 4
#> group start end merge_with_next
#> <int> <dttm> <dttm> <lgl> 
#> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE 
#> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE 
#> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE 
#> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE 
#> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE 
#> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE 
#> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE 
#> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE 
#> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE 
#> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE 
#> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE 
#> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

# Desired result
desired <- tibble(
 group = c(1, 4, 7, 9),
 start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
 end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
)

desired
#> # A tibble: 4 x 3
#> group start end 
#> <dbl> <chr> <chr> 
#> 1 1 2019-01-01 00:00 2019-01-03 23:59
#> 2 4 2019-01-04 00:00 2019-01-06 23:59
#> 3 7 2019-01-07 00:00 2019-01-09 23:59
#> 4 9 2019-01-10 00:00 2019-01-12 23:59

Created on 2019-03-22 by the reprex package (v0.2.1)

I'm looking for a short and clear solution that doesn't involve a myriad of helper tables and loops. The final value in the group column is not significant, I only care about the start and end columns from the result.

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

add a comment |

I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.

library(dplyr)

# Demo data
df <- tibble(
 group = 1:12,
 start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
 end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"), 
 merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
)

df
#> # A tibble: 12 x 4
#> group start end merge_with_next
#> <int> <dttm> <dttm> <lgl> 
#> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE 
#> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE 
#> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE 
#> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE 
#> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE 
#> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE 
#> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE 
#> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE 
#> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE 
#> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE 
#> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE 
#> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

# Desired result
desired <- tibble(
 group = c(1, 4, 7, 9),
 start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
 end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
)

desired
#> # A tibble: 4 x 3
#> group start end 
#> <dbl> <chr> <chr> 
#> 1 1 2019-01-01 00:00 2019-01-03 23:59
#> 2 4 2019-01-04 00:00 2019-01-06 23:59
#> 3 7 2019-01-07 00:00 2019-01-09 23:59
#> 4 9 2019-01-10 00:00 2019-01-12 23:59

Created on 2019-03-22 by the reprex package (v0.2.1)

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

add a comment |

I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.

library(dplyr)

# Demo data
df <- tibble(
 group = 1:12,
 start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
 end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"), 
 merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
)

df
#> # A tibble: 12 x 4
#> group start end merge_with_next
#> <int> <dttm> <dttm> <lgl> 
#> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE 
#> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE 
#> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE 
#> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE 
#> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE 
#> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE 
#> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE 
#> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE 
#> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE 
#> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE 
#> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE 
#> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

# Desired result
desired <- tibble(
 group = c(1, 4, 7, 9),
 start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
 end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
)

desired
#> # A tibble: 4 x 3
#> group start end 
#> <dbl> <chr> <chr> 
#> 1 1 2019-01-01 00:00 2019-01-03 23:59
#> 2 4 2019-01-04 00:00 2019-01-06 23:59
#> 3 7 2019-01-07 00:00 2019-01-09 23:59
#> 4 9 2019-01-10 00:00 2019-01-12 23:59

Created on 2019-03-22 by the reprex package (v0.2.1)

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.

library(dplyr)

# Demo data
df <- tibble(
 group = 1:12,
 start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
 end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"), 
 merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
)

df
#> # A tibble: 12 x 4
#> group start end merge_with_next
#> <int> <dttm> <dttm> <lgl> 
#> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE 
#> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE 
#> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE 
#> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE 
#> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE 
#> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE 
#> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE 
#> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE 
#> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE 
#> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE 
#> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE 
#> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

# Desired result
desired <- tibble(
 group = c(1, 4, 7, 9),
 start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
 end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
)

desired
#> # A tibble: 4 x 3
#> group start end 
#> <dbl> <chr> <chr> 
#> 1 1 2019-01-01 00:00 2019-01-03 23:59
#> 2 4 2019-01-04 00:00 2019-01-06 23:59
#> 3 7 2019-01-07 00:00 2019-01-09 23:59
#> 4 9 2019-01-10 00:00 2019-01-12 23:59

Created on 2019-03-22 by the reprex package (v0.2.1)

r dplyr tibble

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

edited Mar 22 at 10:39

markus

15.8k11336

edited Mar 22 at 10:39

markus

15.8k11336

edited Mar 22 at 10:39

markus

15.8k11336

asked Mar 22 at 10:12

djfinnoy

319110

asked Mar 22 at 10:12

djfinnoy

319110

asked Mar 22 at 10:12

djfinnoy

319110

add a comment |

1 Answer
1

active

oldest

votes

We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.

library(dplyr)

df %>%
 group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
 summarise(group = first(group),
 start = first(start), 
 end = last(end)) %>%
 ungroup() %>%
 select(-temp)

# group start end 
# <int> <dttm> <dttm> 
#1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
#2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
#3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
#4 10 2019-01-10 00:00:00 2019-01-12 23:59:00

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55297317%2faggregate-a-tibble-based-on-a-consecutive-values-in-a-boolean-column%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.

library(dplyr)

df %>%
 group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
 summarise(group = first(group),
 start = first(start), 
 end = last(end)) %>%
 ungroup() %>%
 select(-temp)

# group start end 
# <int> <dttm> <dttm> 
#1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
#2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
#3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
#4 10 2019-01-10 00:00:00 2019-01-12 23:59:00

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

add a comment |

We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.

library(dplyr)

df %>%
 group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
 summarise(group = first(group),
 start = first(start), 
 end = last(end)) %>%
 ungroup() %>%
 select(-temp)

# group start end 
# <int> <dttm> <dttm> 
#1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
#2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
#3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
#4 10 2019-01-10 00:00:00 2019-01-12 23:59:00

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

add a comment |

We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.

library(dplyr)

df %>%
 group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
 summarise(group = first(group),
 start = first(start), 
 end = last(end)) %>%
 ungroup() %>%
 select(-temp)

# group start end 
# <int> <dttm> <dttm> 
#1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
#2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
#3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
#4 10 2019-01-10 00:00:00 2019-01-12 23:59:00

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.

library(dplyr)

df %>%
 group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
 summarise(group = first(group),
 start = first(start), 
 end = last(end)) %>%
 ungroup() %>%
 select(-temp)

# group start end 
# <int> <dttm> <dttm> 
#1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
#2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
#3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
#4 10 2019-01-10 00:00:00 2019-01-12 23:59:00

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

answered Mar 22 at 10:20

Ronak Shah

48.2k104369

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

add a comment |

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

Perfect! Thank you very much.

– djfinnoy
Mar 22 at 10:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1