Aggregate a tibble based on a consecutive values in a boolean column Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Using dplyr::mutate between two dataframes to create column based on date rangeFrequency of multiple boolean or non-boolean columns in Rcumsum by day with hourly observations with dplyrcombining rows with dplyr when number of columns is unknowntransform() to add rows with dplyr()Substract value from tibble column based on another tibbleHow to update the row value with sequentially updated result of previous row of same column in RAggregate a tibble based on date windowsNested `if_else()` and `is.na()` logic inconsistency?row wise test if multiple (not all) columns are equal

Illegal assignment from sObject to Id

Significance of Cersei's obsession with elephants?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

How to react to hostile behavior from a senior developer?

Would the Life Transference spell be unbalanced if it ignored resistance and immunity?

How come Sam didn't become Lord of Horn Hill?

Did Deadpool rescue all of the X-Force?

What is the topology associated with the algebras for the ultrafilter monad?

Sum letters are not two different

What is a fractional matching?

Do I really need to have a message in a novel to appeal to readers?

What is the difference between globalisation and imperialism?

Most bit efficient text communication method?

Why wasn't DOSKEY integrated with COMMAND.COM?

Can the Great Weapon Master feat's damage bonus and accuracy penalty apply to attacks from the Spiritual Weapon spell?

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

Did Krishna say in Bhagavad Gita "I am in every living being"

How to write the following sign?

Time to Settle Down!

If Windows 7 doesn't support WSL, then what does Linux subsystem option mean?

Converted a Scalar function to a TVF function for parallel execution-Still running in Serial mode

Amount of permutations on an NxNxN Rubik's Cube

Is there any word for a place full of confusion?

What was the first language to use conditional keywords?



Aggregate a tibble based on a consecutive values in a boolean column



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Using dplyr::mutate between two dataframes to create column based on date rangeFrequency of multiple boolean or non-boolean columns in Rcumsum by day with hourly observations with dplyrcombining rows with dplyr when number of columns is unknowntransform() to add rows with dplyr()Substract value from tibble column based on another tibbleHow to update the row value with sequentially updated result of previous row of same column in RAggregate a tibble based on date windowsNested `if_else()` and `is.na()` logic inconsistency?row wise test if multiple (not all) columns are equal



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.



I've got a summary table, df, for an hourly timeseries dataset where each observations belongs to a group.
I want to merge some of those groups, based on a boolean column in the summary table.
The boolean column, merge_with_next indicates whether a given group should be merged with the next group (one row down).
The merging effectively occurs by updating the end, value and removing rows:



library(dplyr)

# Demo data
df <- tibble(
group = 1:12,
start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"),
merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
)

df
#> # A tibble: 12 x 4
#> group start end merge_with_next
#> <int> <dttm> <dttm> <lgl>
#> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE
#> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE
#> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE
#> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE
#> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE
#> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE
#> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE
#> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE
#> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE
#> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE
#> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE
#> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

# Desired result
desired <- tibble(
group = c(1, 4, 7, 9),
start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
)

desired
#> # A tibble: 4 x 3
#> group start end
#> <dbl> <chr> <chr>
#> 1 1 2019-01-01 00:00 2019-01-03 23:59
#> 2 4 2019-01-04 00:00 2019-01-06 23:59
#> 3 7 2019-01-07 00:00 2019-01-09 23:59
#> 4 9 2019-01-10 00:00 2019-01-12 23:59


Created on 2019-03-22 by the reprex package (v0.2.1)



I'm looking for a short and clear solution that doesn't involve a myriad of helper tables and loops. The final value in the group column is not significant, I only care about the start and end columns from the result.










share|improve this question






























    0















    I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.



    I've got a summary table, df, for an hourly timeseries dataset where each observations belongs to a group.
    I want to merge some of those groups, based on a boolean column in the summary table.
    The boolean column, merge_with_next indicates whether a given group should be merged with the next group (one row down).
    The merging effectively occurs by updating the end, value and removing rows:



    library(dplyr)

    # Demo data
    df <- tibble(
    group = 1:12,
    start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
    end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"),
    merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
    )

    df
    #> # A tibble: 12 x 4
    #> group start end merge_with_next
    #> <int> <dttm> <dttm> <lgl>
    #> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE
    #> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE
    #> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE
    #> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE
    #> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE
    #> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE
    #> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE
    #> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE
    #> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE
    #> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE
    #> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE
    #> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

    # Desired result
    desired <- tibble(
    group = c(1, 4, 7, 9),
    start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
    end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
    )

    desired
    #> # A tibble: 4 x 3
    #> group start end
    #> <dbl> <chr> <chr>
    #> 1 1 2019-01-01 00:00 2019-01-03 23:59
    #> 2 4 2019-01-04 00:00 2019-01-06 23:59
    #> 3 7 2019-01-07 00:00 2019-01-09 23:59
    #> 4 9 2019-01-10 00:00 2019-01-12 23:59


    Created on 2019-03-22 by the reprex package (v0.2.1)



    I'm looking for a short and clear solution that doesn't involve a myriad of helper tables and loops. The final value in the group column is not significant, I only care about the start and end columns from the result.










    share|improve this question


























      0












      0








      0








      I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.



      I've got a summary table, df, for an hourly timeseries dataset where each observations belongs to a group.
      I want to merge some of those groups, based on a boolean column in the summary table.
      The boolean column, merge_with_next indicates whether a given group should be merged with the next group (one row down).
      The merging effectively occurs by updating the end, value and removing rows:



      library(dplyr)

      # Demo data
      df <- tibble(
      group = 1:12,
      start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
      end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"),
      merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
      )

      df
      #> # A tibble: 12 x 4
      #> group start end merge_with_next
      #> <int> <dttm> <dttm> <lgl>
      #> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE
      #> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE
      #> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE
      #> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE
      #> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE
      #> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE
      #> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE
      #> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE
      #> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE
      #> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE
      #> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE
      #> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

      # Desired result
      desired <- tibble(
      group = c(1, 4, 7, 9),
      start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
      end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
      )

      desired
      #> # A tibble: 4 x 3
      #> group start end
      #> <dbl> <chr> <chr>
      #> 1 1 2019-01-01 00:00 2019-01-03 23:59
      #> 2 4 2019-01-04 00:00 2019-01-06 23:59
      #> 3 7 2019-01-07 00:00 2019-01-09 23:59
      #> 4 9 2019-01-10 00:00 2019-01-12 23:59


      Created on 2019-03-22 by the reprex package (v0.2.1)



      I'm looking for a short and clear solution that doesn't involve a myriad of helper tables and loops. The final value in the group column is not significant, I only care about the start and end columns from the result.










      share|improve this question
















      I've got a fairly straight-forward problem, but I'm struggling to find a solution that doesn't require a wall of code and complicated loops.



      I've got a summary table, df, for an hourly timeseries dataset where each observations belongs to a group.
      I want to merge some of those groups, based on a boolean column in the summary table.
      The boolean column, merge_with_next indicates whether a given group should be merged with the next group (one row down).
      The merging effectively occurs by updating the end, value and removing rows:



      library(dplyr)

      # Demo data
      df <- tibble(
      group = 1:12,
      start = seq.POSIXt(as.POSIXct("2019-01-01 00:00"), as.POSIXct("2019-01-12 00:00"), by = "1 day"),
      end = seq.POSIXt(as.POSIXct("2019-01-01 23:59"), as.POSIXct("2019-01-12 23:59"), by = "1 day"),
      merge_with_next = rep(c(TRUE, TRUE, FALSE), 4)
      )

      df
      #> # A tibble: 12 x 4
      #> group start end merge_with_next
      #> <int> <dttm> <dttm> <lgl>
      #> 1 1 2019-01-01 00:00:00 2019-01-01 23:59:00 TRUE
      #> 2 2 2019-01-02 00:00:00 2019-01-02 23:59:00 TRUE
      #> 3 3 2019-01-03 00:00:00 2019-01-03 23:59:00 FALSE
      #> 4 4 2019-01-04 00:00:00 2019-01-04 23:59:00 TRUE
      #> 5 5 2019-01-05 00:00:00 2019-01-05 23:59:00 TRUE
      #> 6 6 2019-01-06 00:00:00 2019-01-06 23:59:00 FALSE
      #> 7 7 2019-01-07 00:00:00 2019-01-07 23:59:00 TRUE
      #> 8 8 2019-01-08 00:00:00 2019-01-08 23:59:00 TRUE
      #> 9 9 2019-01-09 00:00:00 2019-01-09 23:59:00 FALSE
      #> 10 10 2019-01-10 00:00:00 2019-01-10 23:59:00 TRUE
      #> 11 11 2019-01-11 00:00:00 2019-01-11 23:59:00 TRUE
      #> 12 12 2019-01-12 00:00:00 2019-01-12 23:59:00 FALSE

      # Desired result
      desired <- tibble(
      group = c(1, 4, 7, 9),
      start = c("2019-01-01 00:00", "2019-01-04 00:00", "2019-01-07 00:00", "2019-01-10 00:00"),
      end = c("2019-01-03 23:59", "2019-01-06 23:59", "2019-01-09 23:59", "2019-01-12 23:59")
      )

      desired
      #> # A tibble: 4 x 3
      #> group start end
      #> <dbl> <chr> <chr>
      #> 1 1 2019-01-01 00:00 2019-01-03 23:59
      #> 2 4 2019-01-04 00:00 2019-01-06 23:59
      #> 3 7 2019-01-07 00:00 2019-01-09 23:59
      #> 4 9 2019-01-10 00:00 2019-01-12 23:59


      Created on 2019-03-22 by the reprex package (v0.2.1)



      I'm looking for a short and clear solution that doesn't involve a myriad of helper tables and loops. The final value in the group column is not significant, I only care about the start and end columns from the result.







      r dplyr tibble






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 22 at 10:39









      markus

      15.8k11336




      15.8k11336










      asked Mar 22 at 10:12









      djfinnoydjfinnoy

      319110




      319110






















          1 Answer
          1






          active

          oldest

          votes


















          1














          We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.



          library(dplyr)

          df %>%
          group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
          summarise(group = first(group),
          start = first(start),
          end = last(end)) %>%
          ungroup() %>%
          select(-temp)

          # group start end
          # <int> <dttm> <dttm>
          #1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
          #2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
          #3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
          #4 10 2019-01-10 00:00:00 2019-01-12 23:59:00





          share|improve this answer























          • Perfect! Thank you very much.

            – djfinnoy
            Mar 22 at 10:28











          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55297317%2faggregate-a-tibble-based-on-a-consecutive-values-in-a-boolean-column%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.



          library(dplyr)

          df %>%
          group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
          summarise(group = first(group),
          start = first(start),
          end = last(end)) %>%
          ungroup() %>%
          select(-temp)

          # group start end
          # <int> <dttm> <dttm>
          #1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
          #2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
          #3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
          #4 10 2019-01-10 00:00:00 2019-01-12 23:59:00





          share|improve this answer























          • Perfect! Thank you very much.

            – djfinnoy
            Mar 22 at 10:28















          1














          We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.



          library(dplyr)

          df %>%
          group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
          summarise(group = first(group),
          start = first(start),
          end = last(end)) %>%
          ungroup() %>%
          select(-temp)

          # group start end
          # <int> <dttm> <dttm>
          #1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
          #2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
          #3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
          #4 10 2019-01-10 00:00:00 2019-01-12 23:59:00





          share|improve this answer























          • Perfect! Thank you very much.

            – djfinnoy
            Mar 22 at 10:28













          1












          1








          1







          We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.



          library(dplyr)

          df %>%
          group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
          summarise(group = first(group),
          start = first(start),
          end = last(end)) %>%
          ungroup() %>%
          select(-temp)

          # group start end
          # <int> <dttm> <dttm>
          #1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
          #2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
          #3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
          #4 10 2019-01-10 00:00:00 2019-01-12 23:59:00





          share|improve this answer













          We can use dplyr and create groups based on every time TRUE value occurs in merge_with_next column and select first value from start and last value from end column for each group.



          library(dplyr)

          df %>%
          group_by(temp = cumsum(!lag(merge_with_next, default = TRUE))) %>%
          summarise(group = first(group),
          start = first(start),
          end = last(end)) %>%
          ungroup() %>%
          select(-temp)

          # group start end
          # <int> <dttm> <dttm>
          #1 1 2019-01-01 00:00:00 2019-01-03 23:59:00
          #2 4 2019-01-04 00:00:00 2019-01-06 23:59:00
          #3 7 2019-01-07 00:00:00 2019-01-09 23:59:00
          #4 10 2019-01-10 00:00:00 2019-01-12 23:59:00






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 22 at 10:20









          Ronak ShahRonak Shah

          48.2k104369




          48.2k104369












          • Perfect! Thank you very much.

            – djfinnoy
            Mar 22 at 10:28

















          • Perfect! Thank you very much.

            – djfinnoy
            Mar 22 at 10:28
















          Perfect! Thank you very much.

          – djfinnoy
          Mar 22 at 10:28





          Perfect! Thank you very much.

          – djfinnoy
          Mar 22 at 10:28



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55297317%2faggregate-a-tibble-based-on-a-consecutive-values-in-a-boolean-column%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript