Organizing a dataframe - splitting one column into threeHow to sort a dataframe by multiple column(s)Drop data frame columns by nameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

How to show the equivalence between the regularized regression and their constraint formulas using KKT

Cronab fails because shell path not found

Is it legal for company to use my work email to pretend I still work there?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Forgetting the musical notes while performing in concert

90's TV series where a boy goes to another dimension through portal near power lines

Anagram holiday

How badly should I try to prevent a user from XSSing themselves?

Blender 2.8 I can't see vertices, edges or faces in edit mode

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Alternative to sending password over mail?

Is it canonical bit space?

Is the Joker left-handed?

Why is consensus so controversial in Britain?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Were any external disk drives stacked vertically?

Arrow those variables!

Can a virus destroy the BIOS of a modern computer?

Why doesn't H₄O²⁺ exist?

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

Can a rocket refuel on Mars from water?

Emailing HOD to enhance faculty application

Plain language with long required phrases

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?



Organizing a dataframe - splitting one column into three


How to sort a dataframe by multiple column(s)Drop data frame columns by nameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








4















I have a dataset that looks like this:



Ord_ID Supplier Trans_Type Date
1 A PO 2/3/18
1 A Receipt 2/15/18
2 B PO 2/4/18
2 B Receipt 3/13/18
3 C PO 2/7/18
3 C Receipt 3/1/18
3 C Receipt 3/5/18
3 C Receipt 3/29/18
4 B PO 2/9/18
4 B Receipt 2/20/18
4 B Receipt 2/27/18
5 D PO 2/18/18
5 D Receipt 4/2/18


Basically, I need to separate the Date column into 3 different columns. I need a PO_Date column, a column that lists the earliest receipt date for each order, and the last receipt date for each order. Because some orders only have one receipt date, the 2nd and 3rd columns should be the same. I've tried using spread(), but I guess because there are varying numbers of Receipt dates for each order it didn't work. How can I make this happen?



Desired result:



Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
1 A 2/3/18 2/15/18 2/15/18
2 B 2/4/18 3/13/18 3/13/18
3 C 2/7/18 3/1/18 3/29/18
4 B 2/9/18 2/20/18 2/27/18
5 D 2/18/18 4/2/18 4/2/18









share|improve this question




























    4















    I have a dataset that looks like this:



    Ord_ID Supplier Trans_Type Date
    1 A PO 2/3/18
    1 A Receipt 2/15/18
    2 B PO 2/4/18
    2 B Receipt 3/13/18
    3 C PO 2/7/18
    3 C Receipt 3/1/18
    3 C Receipt 3/5/18
    3 C Receipt 3/29/18
    4 B PO 2/9/18
    4 B Receipt 2/20/18
    4 B Receipt 2/27/18
    5 D PO 2/18/18
    5 D Receipt 4/2/18


    Basically, I need to separate the Date column into 3 different columns. I need a PO_Date column, a column that lists the earliest receipt date for each order, and the last receipt date for each order. Because some orders only have one receipt date, the 2nd and 3rd columns should be the same. I've tried using spread(), but I guess because there are varying numbers of Receipt dates for each order it didn't work. How can I make this happen?



    Desired result:



    Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
    1 A 2/3/18 2/15/18 2/15/18
    2 B 2/4/18 3/13/18 3/13/18
    3 C 2/7/18 3/1/18 3/29/18
    4 B 2/9/18 2/20/18 2/27/18
    5 D 2/18/18 4/2/18 4/2/18









    share|improve this question
























      4












      4








      4








      I have a dataset that looks like this:



      Ord_ID Supplier Trans_Type Date
      1 A PO 2/3/18
      1 A Receipt 2/15/18
      2 B PO 2/4/18
      2 B Receipt 3/13/18
      3 C PO 2/7/18
      3 C Receipt 3/1/18
      3 C Receipt 3/5/18
      3 C Receipt 3/29/18
      4 B PO 2/9/18
      4 B Receipt 2/20/18
      4 B Receipt 2/27/18
      5 D PO 2/18/18
      5 D Receipt 4/2/18


      Basically, I need to separate the Date column into 3 different columns. I need a PO_Date column, a column that lists the earliest receipt date for each order, and the last receipt date for each order. Because some orders only have one receipt date, the 2nd and 3rd columns should be the same. I've tried using spread(), but I guess because there are varying numbers of Receipt dates for each order it didn't work. How can I make this happen?



      Desired result:



      Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
      1 A 2/3/18 2/15/18 2/15/18
      2 B 2/4/18 3/13/18 3/13/18
      3 C 2/7/18 3/1/18 3/29/18
      4 B 2/9/18 2/20/18 2/27/18
      5 D 2/18/18 4/2/18 4/2/18









      share|improve this question














      I have a dataset that looks like this:



      Ord_ID Supplier Trans_Type Date
      1 A PO 2/3/18
      1 A Receipt 2/15/18
      2 B PO 2/4/18
      2 B Receipt 3/13/18
      3 C PO 2/7/18
      3 C Receipt 3/1/18
      3 C Receipt 3/5/18
      3 C Receipt 3/29/18
      4 B PO 2/9/18
      4 B Receipt 2/20/18
      4 B Receipt 2/27/18
      5 D PO 2/18/18
      5 D Receipt 4/2/18


      Basically, I need to separate the Date column into 3 different columns. I need a PO_Date column, a column that lists the earliest receipt date for each order, and the last receipt date for each order. Because some orders only have one receipt date, the 2nd and 3rd columns should be the same. I've tried using spread(), but I guess because there are varying numbers of Receipt dates for each order it didn't work. How can I make this happen?



      Desired result:



      Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
      1 A 2/3/18 2/15/18 2/15/18
      2 B 2/4/18 3/13/18 3/13/18
      3 C 2/7/18 3/1/18 3/29/18
      4 B 2/9/18 2/20/18 2/27/18
      5 D 2/18/18 4/2/18 4/2/18






      r dataframe spread






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 21 at 21:37









      MillieMillie

      253




      253






















          5 Answers
          5






          active

          oldest

          votes


















          3














          Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:



          library(dplyr)
          mydata <- mydata %>%
          mutate(Date = as.Date(Date, "%m/%d/%y")


          Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:



          mydata %>% 
          filter(Trans_Type == "Receipt") %>%
          group_by(Ord_ID, Supplier) %>%
          summarise(First_Receipt_Date = min(Date),
          Last_Receipt_Date = max(Date)) %>%
          ungroup() %>%
          left_join(filter(mydata, Trans_Type == "PO")) %>%
          select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)


          Result:



           Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
          <int> <chr> <date> <date> <date>
          1 1 A 2018-02-03 2018-02-15 2018-02-15
          2 2 B 2018-02-04 2018-03-13 2018-03-13
          3 3 C 2018-02-07 2018-03-01 2018-03-29
          4 4 B 2018-02-09 2018-02-20 2018-02-27
          5 5 D 2018-02-18 2018-04-02 2018-04-02





          share|improve this answer























          • When I run this I get this error: "Error: by required, because the data sources have no common variables"

            – Millie
            Mar 22 at 18:36












          • Works for me using the example data in the question: the join is on Ord_ID and Supplier.

            – neilfws
            Mar 23 at 3:06












          • Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

            – Millie
            Mar 25 at 22:05


















          2














          With tidyverse, borrowing @divibisan's sample data :



          library(tidyverse)

          df %>%
          group_by(Ord_ID, Supplier) %>%
          slice(c(1:2, n())) %>%
          mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
          spread(Trans_Type, Date) %>%
          ungroup()

          # # A tibble: 5 x 5
          # Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date
          # <int> <fct> <date> <date> <date>
          # 1 1 A 2018-02-15 2018-02-15 2018-02-03
          # 2 2 B 2018-03-13 2018-03-13 2018-02-04
          # 3 3 C 2018-03-01 2018-03-29 2018-02-07
          # 4 4 B 2018-02-20 2018-02-27 2018-02-09
          # 5 5 D 2018-04-02 2018-04-02 2018-02-18


          If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.






          share|improve this answer

























          • I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

            – Sébastien Rochette
            Mar 22 at 14:32











          • If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

            – Moody_Mudskipper
            Mar 22 at 17:15











          • I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

            – Moody_Mudskipper
            Mar 22 at 17:18











          • I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

            – Sébastien Rochette
            Mar 22 at 17:22







          • 1





            That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

            – Moody_Mudskipper
            Mar 22 at 17:29


















          1














          I would start with something like this:



          data %>%
          group_by(Supplier, Trans_Type) %>%
          summarise(min_date = min(Date),
          max_date = max(Date)
          ) %>%
          ungroup()


          Then, you can play with gatherand spread to retrieve the columns you need.






          share|improve this answer






























            0














            Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:



            df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
            4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
            3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
            Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt",
            "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
            ), Date = structure(c(17565, 17577, 17566, 17603, 17569,
            17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA,
            -13L), class = "data.frame")



            df %>%
            group_by(Ord_ID, Supplier, Trans_Type) %>%
            # Keep only min and max date values
            filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
            # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
            mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
            'Receipt_2', Trans_Type)) %>%
            # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
            ungroup(Trans_Type) %>%
            select(-Trans_Type) %>%
            # Spread the now unduplicated Trans_Type values
            spread(Trans_Type2, Date) %>%
            # Fill in Receipt_2 values where they're missing
            mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

            # A tibble: 5 x 5
            Ord_ID Supplier PO Receipt Receipt_2
            <int> <fct> <date> <date> <date>
            1 1 A 2018-02-03 2018-02-15 2018-02-15
            2 2 B 2018-02-04 2018-03-13 2018-03-13
            3 3 C 2018-02-07 2018-03-01 2018-03-29
            4 4 B 2018-02-09 2018-02-20 2018-02-27
            5 5 D 2018-02-18 2018-04-02 2018-04-02





            share|improve this answer






























              0














              You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:



              test1<-test %>%
              mutate(Date = mdy(Date)) %>%
              group_by(Ord_ID) %>%
              mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
              Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
              Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
              filter(!is.na(PO_Date)) %>%
              mutate(PO_Date = as.Date(as.numeric(PO_Date)))


              A breakdown:



              test1<-test %>%

              #convert format of "Date" column to as.Date to identify min and max dates
              mutate(Date = mdy(Date)) %>%

              #group by the Order ID
              group_by(Ord_ID) %>%

              #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
              #dplyr will convert this to numeric, but can be fixed later
              mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

              #first receipt date is the minimum date of a receipt transaction
              Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

              #last receipt date is the maximum date of a receipt transaction
              Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

              #to remove duplicates
              filter(!is.na(PO_Date)) %>%

              #convert "PO_Date" column back to as.Date from numeric
              mutate(PO_Date = as.Date(as.numeric(PO_Date)))





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289624%2forganizing-a-dataframe-splitting-one-column-into-three%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                3














                Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:



                library(dplyr)
                mydata <- mydata %>%
                mutate(Date = as.Date(Date, "%m/%d/%y")


                Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:



                mydata %>% 
                filter(Trans_Type == "Receipt") %>%
                group_by(Ord_ID, Supplier) %>%
                summarise(First_Receipt_Date = min(Date),
                Last_Receipt_Date = max(Date)) %>%
                ungroup() %>%
                left_join(filter(mydata, Trans_Type == "PO")) %>%
                select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)


                Result:



                 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
                <int> <chr> <date> <date> <date>
                1 1 A 2018-02-03 2018-02-15 2018-02-15
                2 2 B 2018-02-04 2018-03-13 2018-03-13
                3 3 C 2018-02-07 2018-03-01 2018-03-29
                4 4 B 2018-02-09 2018-02-20 2018-02-27
                5 5 D 2018-02-18 2018-04-02 2018-04-02





                share|improve this answer























                • When I run this I get this error: "Error: by required, because the data sources have no common variables"

                  – Millie
                  Mar 22 at 18:36












                • Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                  – neilfws
                  Mar 23 at 3:06












                • Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                  – Millie
                  Mar 25 at 22:05















                3














                Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:



                library(dplyr)
                mydata <- mydata %>%
                mutate(Date = as.Date(Date, "%m/%d/%y")


                Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:



                mydata %>% 
                filter(Trans_Type == "Receipt") %>%
                group_by(Ord_ID, Supplier) %>%
                summarise(First_Receipt_Date = min(Date),
                Last_Receipt_Date = max(Date)) %>%
                ungroup() %>%
                left_join(filter(mydata, Trans_Type == "PO")) %>%
                select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)


                Result:



                 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
                <int> <chr> <date> <date> <date>
                1 1 A 2018-02-03 2018-02-15 2018-02-15
                2 2 B 2018-02-04 2018-03-13 2018-03-13
                3 3 C 2018-02-07 2018-03-01 2018-03-29
                4 4 B 2018-02-09 2018-02-20 2018-02-27
                5 5 D 2018-02-18 2018-04-02 2018-04-02





                share|improve this answer























                • When I run this I get this error: "Error: by required, because the data sources have no common variables"

                  – Millie
                  Mar 22 at 18:36












                • Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                  – neilfws
                  Mar 23 at 3:06












                • Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                  – Millie
                  Mar 25 at 22:05













                3












                3








                3







                Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:



                library(dplyr)
                mydata <- mydata %>%
                mutate(Date = as.Date(Date, "%m/%d/%y")


                Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:



                mydata %>% 
                filter(Trans_Type == "Receipt") %>%
                group_by(Ord_ID, Supplier) %>%
                summarise(First_Receipt_Date = min(Date),
                Last_Receipt_Date = max(Date)) %>%
                ungroup() %>%
                left_join(filter(mydata, Trans_Type == "PO")) %>%
                select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)


                Result:



                 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
                <int> <chr> <date> <date> <date>
                1 1 A 2018-02-03 2018-02-15 2018-02-15
                2 2 B 2018-02-04 2018-03-13 2018-03-13
                3 3 C 2018-02-07 2018-03-01 2018-03-29
                4 4 B 2018-02-09 2018-02-20 2018-02-27
                5 5 D 2018-02-18 2018-04-02 2018-04-02





                share|improve this answer













                Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:



                library(dplyr)
                mydata <- mydata %>%
                mutate(Date = as.Date(Date, "%m/%d/%y")


                Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:



                mydata %>% 
                filter(Trans_Type == "Receipt") %>%
                group_by(Ord_ID, Supplier) %>%
                summarise(First_Receipt_Date = min(Date),
                Last_Receipt_Date = max(Date)) %>%
                ungroup() %>%
                left_join(filter(mydata, Trans_Type == "PO")) %>%
                select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)


                Result:



                 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
                <int> <chr> <date> <date> <date>
                1 1 A 2018-02-03 2018-02-15 2018-02-15
                2 2 B 2018-02-04 2018-03-13 2018-03-13
                3 3 C 2018-02-07 2018-03-01 2018-03-29
                4 4 B 2018-02-09 2018-02-20 2018-02-27
                5 5 D 2018-02-18 2018-04-02 2018-04-02






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 21 at 22:06









                neilfwsneilfws

                18.6k53749




                18.6k53749












                • When I run this I get this error: "Error: by required, because the data sources have no common variables"

                  – Millie
                  Mar 22 at 18:36












                • Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                  – neilfws
                  Mar 23 at 3:06












                • Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                  – Millie
                  Mar 25 at 22:05

















                • When I run this I get this error: "Error: by required, because the data sources have no common variables"

                  – Millie
                  Mar 22 at 18:36












                • Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                  – neilfws
                  Mar 23 at 3:06












                • Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                  – Millie
                  Mar 25 at 22:05
















                When I run this I get this error: "Error: by required, because the data sources have no common variables"

                – Millie
                Mar 22 at 18:36






                When I run this I get this error: "Error: by required, because the data sources have no common variables"

                – Millie
                Mar 22 at 18:36














                Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                – neilfws
                Mar 23 at 3:06






                Works for me using the example data in the question: the join is on Ord_ID and Supplier.

                – neilfws
                Mar 23 at 3:06














                Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                – Millie
                Mar 25 at 22:05





                Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

                – Millie
                Mar 25 at 22:05













                2














                With tidyverse, borrowing @divibisan's sample data :



                library(tidyverse)

                df %>%
                group_by(Ord_ID, Supplier) %>%
                slice(c(1:2, n())) %>%
                mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
                spread(Trans_Type, Date) %>%
                ungroup()

                # # A tibble: 5 x 5
                # Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date
                # <int> <fct> <date> <date> <date>
                # 1 1 A 2018-02-15 2018-02-15 2018-02-03
                # 2 2 B 2018-03-13 2018-03-13 2018-02-04
                # 3 3 C 2018-03-01 2018-03-29 2018-02-07
                # 4 4 B 2018-02-20 2018-02-27 2018-02-09
                # 5 5 D 2018-04-02 2018-04-02 2018-02-18


                If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.






                share|improve this answer

























                • I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                  – Sébastien Rochette
                  Mar 22 at 14:32











                • If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                  – Moody_Mudskipper
                  Mar 22 at 17:15











                • I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                  – Moody_Mudskipper
                  Mar 22 at 17:18











                • I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                  – Sébastien Rochette
                  Mar 22 at 17:22







                • 1





                  That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                  – Moody_Mudskipper
                  Mar 22 at 17:29















                2














                With tidyverse, borrowing @divibisan's sample data :



                library(tidyverse)

                df %>%
                group_by(Ord_ID, Supplier) %>%
                slice(c(1:2, n())) %>%
                mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
                spread(Trans_Type, Date) %>%
                ungroup()

                # # A tibble: 5 x 5
                # Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date
                # <int> <fct> <date> <date> <date>
                # 1 1 A 2018-02-15 2018-02-15 2018-02-03
                # 2 2 B 2018-03-13 2018-03-13 2018-02-04
                # 3 3 C 2018-03-01 2018-03-29 2018-02-07
                # 4 4 B 2018-02-20 2018-02-27 2018-02-09
                # 5 5 D 2018-04-02 2018-04-02 2018-02-18


                If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.






                share|improve this answer

























                • I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                  – Sébastien Rochette
                  Mar 22 at 14:32











                • If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                  – Moody_Mudskipper
                  Mar 22 at 17:15











                • I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                  – Moody_Mudskipper
                  Mar 22 at 17:18











                • I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                  – Sébastien Rochette
                  Mar 22 at 17:22







                • 1





                  That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                  – Moody_Mudskipper
                  Mar 22 at 17:29













                2












                2








                2







                With tidyverse, borrowing @divibisan's sample data :



                library(tidyverse)

                df %>%
                group_by(Ord_ID, Supplier) %>%
                slice(c(1:2, n())) %>%
                mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
                spread(Trans_Type, Date) %>%
                ungroup()

                # # A tibble: 5 x 5
                # Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date
                # <int> <fct> <date> <date> <date>
                # 1 1 A 2018-02-15 2018-02-15 2018-02-03
                # 2 2 B 2018-03-13 2018-03-13 2018-02-04
                # 3 3 C 2018-03-01 2018-03-29 2018-02-07
                # 4 4 B 2018-02-20 2018-02-27 2018-02-09
                # 5 5 D 2018-04-02 2018-04-02 2018-02-18


                If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.






                share|improve this answer















                With tidyverse, borrowing @divibisan's sample data :



                library(tidyverse)

                df %>%
                group_by(Ord_ID, Supplier) %>%
                slice(c(1:2, n())) %>%
                mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
                spread(Trans_Type, Date) %>%
                ungroup()

                # # A tibble: 5 x 5
                # Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date
                # <int> <fct> <date> <date> <date>
                # 1 1 A 2018-02-15 2018-02-15 2018-02-03
                # 2 2 B 2018-03-13 2018-03-13 2018-02-04
                # 3 3 C 2018-03-01 2018-03-29 2018-02-07
                # 4 4 B 2018-02-20 2018-02-27 2018-02-09
                # 5 5 D 2018-04-02 2018-04-02 2018-02-18


                If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 22 at 17:25

























                answered Mar 22 at 9:58









                Moody_MudskipperMoody_Mudskipper

                24.7k33570




                24.7k33570












                • I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                  – Sébastien Rochette
                  Mar 22 at 14:32











                • If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                  – Moody_Mudskipper
                  Mar 22 at 17:15











                • I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                  – Moody_Mudskipper
                  Mar 22 at 17:18











                • I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                  – Sébastien Rochette
                  Mar 22 at 17:22







                • 1





                  That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                  – Moody_Mudskipper
                  Mar 22 at 17:29

















                • I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                  – Sébastien Rochette
                  Mar 22 at 14:32











                • If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                  – Moody_Mudskipper
                  Mar 22 at 17:15











                • I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                  – Moody_Mudskipper
                  Mar 22 at 17:18











                • I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                  – Sébastien Rochette
                  Mar 22 at 17:22







                • 1





                  That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                  – Moody_Mudskipper
                  Mar 22 at 17:29
















                I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                – Sébastien Rochette
                Mar 22 at 14:32





                I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

                – Sébastien Rochette
                Mar 22 at 14:32













                If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                – Moody_Mudskipper
                Mar 22 at 17:15





                If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

                – Moody_Mudskipper
                Mar 22 at 17:15













                I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                – Moody_Mudskipper
                Mar 22 at 17:18





                I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

                – Moody_Mudskipper
                Mar 22 at 17:18













                I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                – Sébastien Rochette
                Mar 22 at 17:22






                I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

                – Sébastien Rochette
                Mar 22 at 17:22





                1




                1





                That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                – Moody_Mudskipper
                Mar 22 at 17:29





                That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

                – Moody_Mudskipper
                Mar 22 at 17:29











                1














                I would start with something like this:



                data %>%
                group_by(Supplier, Trans_Type) %>%
                summarise(min_date = min(Date),
                max_date = max(Date)
                ) %>%
                ungroup()


                Then, you can play with gatherand spread to retrieve the columns you need.






                share|improve this answer



























                  1














                  I would start with something like this:



                  data %>%
                  group_by(Supplier, Trans_Type) %>%
                  summarise(min_date = min(Date),
                  max_date = max(Date)
                  ) %>%
                  ungroup()


                  Then, you can play with gatherand spread to retrieve the columns you need.






                  share|improve this answer

























                    1












                    1








                    1







                    I would start with something like this:



                    data %>%
                    group_by(Supplier, Trans_Type) %>%
                    summarise(min_date = min(Date),
                    max_date = max(Date)
                    ) %>%
                    ungroup()


                    Then, you can play with gatherand spread to retrieve the columns you need.






                    share|improve this answer













                    I would start with something like this:



                    data %>%
                    group_by(Supplier, Trans_Type) %>%
                    summarise(min_date = min(Date),
                    max_date = max(Date)
                    ) %>%
                    ungroup()


                    Then, you can play with gatherand spread to retrieve the columns you need.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 21 at 21:55









                    Sébastien RochetteSébastien Rochette

                    4,3232929




                    4,3232929





















                        0














                        Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:



                        df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
                        4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
                        3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
                        Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt",
                        "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
                        ), Date = structure(c(17565, 17577, 17566, 17603, 17569,
                        17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA,
                        -13L), class = "data.frame")



                        df %>%
                        group_by(Ord_ID, Supplier, Trans_Type) %>%
                        # Keep only min and max date values
                        filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
                        # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
                        mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
                        'Receipt_2', Trans_Type)) %>%
                        # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
                        ungroup(Trans_Type) %>%
                        select(-Trans_Type) %>%
                        # Spread the now unduplicated Trans_Type values
                        spread(Trans_Type2, Date) %>%
                        # Fill in Receipt_2 values where they're missing
                        mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

                        # A tibble: 5 x 5
                        Ord_ID Supplier PO Receipt Receipt_2
                        <int> <fct> <date> <date> <date>
                        1 1 A 2018-02-03 2018-02-15 2018-02-15
                        2 2 B 2018-02-04 2018-03-13 2018-03-13
                        3 3 C 2018-02-07 2018-03-01 2018-03-29
                        4 4 B 2018-02-09 2018-02-20 2018-02-27
                        5 5 D 2018-02-18 2018-04-02 2018-04-02





                        share|improve this answer



























                          0














                          Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:



                          df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
                          4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
                          3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
                          Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt",
                          "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
                          ), Date = structure(c(17565, 17577, 17566, 17603, 17569,
                          17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA,
                          -13L), class = "data.frame")



                          df %>%
                          group_by(Ord_ID, Supplier, Trans_Type) %>%
                          # Keep only min and max date values
                          filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
                          # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
                          mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
                          'Receipt_2', Trans_Type)) %>%
                          # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
                          ungroup(Trans_Type) %>%
                          select(-Trans_Type) %>%
                          # Spread the now unduplicated Trans_Type values
                          spread(Trans_Type2, Date) %>%
                          # Fill in Receipt_2 values where they're missing
                          mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

                          # A tibble: 5 x 5
                          Ord_ID Supplier PO Receipt Receipt_2
                          <int> <fct> <date> <date> <date>
                          1 1 A 2018-02-03 2018-02-15 2018-02-15
                          2 2 B 2018-02-04 2018-03-13 2018-03-13
                          3 3 C 2018-02-07 2018-03-01 2018-03-29
                          4 4 B 2018-02-09 2018-02-20 2018-02-27
                          5 5 D 2018-02-18 2018-04-02 2018-04-02





                          share|improve this answer

























                            0












                            0








                            0







                            Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:



                            df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
                            4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
                            3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
                            Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt",
                            "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
                            ), Date = structure(c(17565, 17577, 17566, 17603, 17569,
                            17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA,
                            -13L), class = "data.frame")



                            df %>%
                            group_by(Ord_ID, Supplier, Trans_Type) %>%
                            # Keep only min and max date values
                            filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
                            # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
                            mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
                            'Receipt_2', Trans_Type)) %>%
                            # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
                            ungroup(Trans_Type) %>%
                            select(-Trans_Type) %>%
                            # Spread the now unduplicated Trans_Type values
                            spread(Trans_Type2, Date) %>%
                            # Fill in Receipt_2 values where they're missing
                            mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

                            # A tibble: 5 x 5
                            Ord_ID Supplier PO Receipt Receipt_2
                            <int> <fct> <date> <date> <date>
                            1 1 A 2018-02-03 2018-02-15 2018-02-15
                            2 2 B 2018-02-04 2018-03-13 2018-03-13
                            3 3 C 2018-02-07 2018-03-01 2018-03-29
                            4 4 B 2018-02-09 2018-02-20 2018-02-27
                            5 5 D 2018-02-18 2018-04-02 2018-04-02





                            share|improve this answer













                            Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:



                            df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
                            4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
                            3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
                            Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt",
                            "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
                            ), Date = structure(c(17565, 17577, 17566, 17603, 17569,
                            17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA,
                            -13L), class = "data.frame")



                            df %>%
                            group_by(Ord_ID, Supplier, Trans_Type) %>%
                            # Keep only min and max date values
                            filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
                            # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
                            mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
                            'Receipt_2', Trans_Type)) %>%
                            # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
                            ungroup(Trans_Type) %>%
                            select(-Trans_Type) %>%
                            # Spread the now unduplicated Trans_Type values
                            spread(Trans_Type2, Date) %>%
                            # Fill in Receipt_2 values where they're missing
                            mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

                            # A tibble: 5 x 5
                            Ord_ID Supplier PO Receipt Receipt_2
                            <int> <fct> <date> <date> <date>
                            1 1 A 2018-02-03 2018-02-15 2018-02-15
                            2 2 B 2018-02-04 2018-03-13 2018-03-13
                            3 3 C 2018-02-07 2018-03-01 2018-03-29
                            4 4 B 2018-02-09 2018-02-20 2018-02-27
                            5 5 D 2018-02-18 2018-04-02 2018-04-02






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Mar 21 at 22:24









                            divibisandivibisan

                            5,14581834




                            5,14581834





















                                0














                                You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:



                                test1<-test %>%
                                mutate(Date = mdy(Date)) %>%
                                group_by(Ord_ID) %>%
                                mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
                                Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
                                Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
                                filter(!is.na(PO_Date)) %>%
                                mutate(PO_Date = as.Date(as.numeric(PO_Date)))


                                A breakdown:



                                test1<-test %>%

                                #convert format of "Date" column to as.Date to identify min and max dates
                                mutate(Date = mdy(Date)) %>%

                                #group by the Order ID
                                group_by(Ord_ID) %>%

                                #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
                                #dplyr will convert this to numeric, but can be fixed later
                                mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

                                #first receipt date is the minimum date of a receipt transaction
                                Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

                                #last receipt date is the maximum date of a receipt transaction
                                Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

                                #to remove duplicates
                                filter(!is.na(PO_Date)) %>%

                                #convert "PO_Date" column back to as.Date from numeric
                                mutate(PO_Date = as.Date(as.numeric(PO_Date)))





                                share|improve this answer



























                                  0














                                  You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:



                                  test1<-test %>%
                                  mutate(Date = mdy(Date)) %>%
                                  group_by(Ord_ID) %>%
                                  mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
                                  Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
                                  Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
                                  filter(!is.na(PO_Date)) %>%
                                  mutate(PO_Date = as.Date(as.numeric(PO_Date)))


                                  A breakdown:



                                  test1<-test %>%

                                  #convert format of "Date" column to as.Date to identify min and max dates
                                  mutate(Date = mdy(Date)) %>%

                                  #group by the Order ID
                                  group_by(Ord_ID) %>%

                                  #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
                                  #dplyr will convert this to numeric, but can be fixed later
                                  mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

                                  #first receipt date is the minimum date of a receipt transaction
                                  Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

                                  #last receipt date is the maximum date of a receipt transaction
                                  Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

                                  #to remove duplicates
                                  filter(!is.na(PO_Date)) %>%

                                  #convert "PO_Date" column back to as.Date from numeric
                                  mutate(PO_Date = as.Date(as.numeric(PO_Date)))





                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:



                                    test1<-test %>%
                                    mutate(Date = mdy(Date)) %>%
                                    group_by(Ord_ID) %>%
                                    mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
                                    Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
                                    Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
                                    filter(!is.na(PO_Date)) %>%
                                    mutate(PO_Date = as.Date(as.numeric(PO_Date)))


                                    A breakdown:



                                    test1<-test %>%

                                    #convert format of "Date" column to as.Date to identify min and max dates
                                    mutate(Date = mdy(Date)) %>%

                                    #group by the Order ID
                                    group_by(Ord_ID) %>%

                                    #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
                                    #dplyr will convert this to numeric, but can be fixed later
                                    mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

                                    #first receipt date is the minimum date of a receipt transaction
                                    Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

                                    #last receipt date is the maximum date of a receipt transaction
                                    Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

                                    #to remove duplicates
                                    filter(!is.na(PO_Date)) %>%

                                    #convert "PO_Date" column back to as.Date from numeric
                                    mutate(PO_Date = as.Date(as.numeric(PO_Date)))





                                    share|improve this answer













                                    You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:



                                    test1<-test %>%
                                    mutate(Date = mdy(Date)) %>%
                                    group_by(Ord_ID) %>%
                                    mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
                                    Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
                                    Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
                                    filter(!is.na(PO_Date)) %>%
                                    mutate(PO_Date = as.Date(as.numeric(PO_Date)))


                                    A breakdown:



                                    test1<-test %>%

                                    #convert format of "Date" column to as.Date to identify min and max dates
                                    mutate(Date = mdy(Date)) %>%

                                    #group by the Order ID
                                    group_by(Ord_ID) %>%

                                    #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
                                    #dplyr will convert this to numeric, but can be fixed later
                                    mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

                                    #first receipt date is the minimum date of a receipt transaction
                                    Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

                                    #last receipt date is the maximum date of a receipt transaction
                                    Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

                                    #to remove duplicates
                                    filter(!is.na(PO_Date)) %>%

                                    #convert "PO_Date" column back to as.Date from numeric
                                    mutate(PO_Date = as.Date(as.numeric(PO_Date)))






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Mar 21 at 23:21









                                    S. AshS. Ash

                                    413




                                    413



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289624%2forganizing-a-dataframe-splitting-one-column-into-three%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                                        용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

                                        155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해