Organizing a dataframe - splitting one column into threeHow to sort a dataframe by multiple column(s)Drop data frame columns by nameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

How to show the equivalence between the regularized regression and their constraint formulas using KKT

Cronab fails because shell path not found

Is it legal for company to use my work email to pretend I still work there?

Can I ask the recruiters in my resume to put the reason why I am rejected?

Forgetting the musical notes while performing in concert

90's TV series where a boy goes to another dimension through portal near power lines

Anagram holiday

How badly should I try to prevent a user from XSSing themselves?

Blender 2.8 I can't see vertices, edges or faces in edit mode

AES: Why is it a good practice to use only the first 16bytes of a hash for encryption?

Alternative to sending password over mail?

Is it canonical bit space?

Is the Joker left-handed?

Why is consensus so controversial in Britain?

Has there ever been an airliner design involving reducing generator load by installing solar panels?

Were any external disk drives stacked vertically?

Arrow those variables!

Can a virus destroy the BIOS of a modern computer?

Why doesn't H₄O²⁺ exist?

Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?

Can a rocket refuel on Mars from water?

Emailing HOD to enhance faculty application

Plain language with long required phrases

Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?

Organizing a dataframe - splitting one column into three

How to sort a dataframe by multiple column(s)Drop data frame columns by nameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasHow to change the order of DataFrame columns?Delete column from pandas DataFrame by column nameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a dataset that looks like this:

Ord_ID Supplier Trans_Type Date
1 A PO 2/3/18
1 A Receipt 2/15/18
2 B PO 2/4/18
2 B Receipt 3/13/18
3 C PO 2/7/18
3 C Receipt 3/1/18
3 C Receipt 3/5/18
3 C Receipt 3/29/18
4 B PO 2/9/18
4 B Receipt 2/20/18
4 B Receipt 2/27/18
5 D PO 2/18/18
5 D Receipt 4/2/18

Basically, I need to separate the Date column into 3 different columns. I need a PO_Date column, a column that lists the earliest receipt date for each order, and the last receipt date for each order. Because some orders only have one receipt date, the 2nd and 3rd columns should be the same. I've tried using spread(), but I guess because there are varying numbers of Receipt dates for each order it didn't work. How can I make this happen?

Desired result:

Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
1 A 2/3/18 2/15/18 2/15/18
2 B 2/4/18 3/13/18 3/13/18
3 C 2/7/18 3/1/18 3/29/18
4 B 2/9/18 2/20/18 2/27/18
5 D 2/18/18 4/2/18 4/2/18

asked Mar 21 at 21:37

Millie

253

add a comment |

I have a dataset that looks like this:

Ord_ID Supplier Trans_Type Date
1 A PO 2/3/18
1 A Receipt 2/15/18
2 B PO 2/4/18
2 B Receipt 3/13/18
3 C PO 2/7/18
3 C Receipt 3/1/18
3 C Receipt 3/5/18
3 C Receipt 3/29/18
4 B PO 2/9/18
4 B Receipt 2/20/18
4 B Receipt 2/27/18
5 D PO 2/18/18
5 D Receipt 4/2/18

Desired result:

Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
1 A 2/3/18 2/15/18 2/15/18
2 B 2/4/18 3/13/18 3/13/18
3 C 2/7/18 3/1/18 3/29/18
4 B 2/9/18 2/20/18 2/27/18
5 D 2/18/18 4/2/18 4/2/18

asked Mar 21 at 21:37

Millie

253

add a comment |

I have a dataset that looks like this:

Ord_ID Supplier Trans_Type Date
1 A PO 2/3/18
1 A Receipt 2/15/18
2 B PO 2/4/18
2 B Receipt 3/13/18
3 C PO 2/7/18
3 C Receipt 3/1/18
3 C Receipt 3/5/18
3 C Receipt 3/29/18
4 B PO 2/9/18
4 B Receipt 2/20/18
4 B Receipt 2/27/18
5 D PO 2/18/18
5 D Receipt 4/2/18

Desired result:

Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
1 A 2/3/18 2/15/18 2/15/18
2 B 2/4/18 3/13/18 3/13/18
3 C 2/7/18 3/1/18 3/29/18
4 B 2/9/18 2/20/18 2/27/18
5 D 2/18/18 4/2/18 4/2/18

asked Mar 21 at 21:37

Millie

253

I have a dataset that looks like this:

Ord_ID Supplier Trans_Type Date
1 A PO 2/3/18
1 A Receipt 2/15/18
2 B PO 2/4/18
2 B Receipt 3/13/18
3 C PO 2/7/18
3 C Receipt 3/1/18
3 C Receipt 3/5/18
3 C Receipt 3/29/18
4 B PO 2/9/18
4 B Receipt 2/20/18
4 B Receipt 2/27/18
5 D PO 2/18/18
5 D Receipt 4/2/18

Desired result:

Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
1 A 2/3/18 2/15/18 2/15/18
2 B 2/4/18 3/13/18 3/13/18
3 C 2/7/18 3/1/18 3/29/18
4 B 2/9/18 2/20/18 2/27/18
5 D 2/18/18 4/2/18 4/2/18

r dataframe spread

asked Mar 21 at 21:37

Millie

253

asked Mar 21 at 21:37

Millie

253

asked Mar 21 at 21:37

Millie

253

asked Mar 21 at 21:37

Millie

253

asked Mar 21 at 21:37

Millie

253

add a comment |

5 Answers
5

active

oldest

votes

Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:

library(dplyr)
mydata <- mydata %>% 
 mutate(Date = as.Date(Date, "%m/%d/%y")

Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:

mydata %>% 
 filter(Trans_Type == "Receipt") %>% 
 group_by(Ord_ID, Supplier) %>% 
 summarise(First_Receipt_Date = min(Date), 
 Last_Receipt_Date = max(Date)) %>% 
 ungroup() %>%
 left_join(filter(mydata, Trans_Type == "PO")) %>% 
 select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)

Result:

 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
 <int> <chr> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15 
2 2 B 2018-02-04 2018-03-13 2018-03-13 
3 3 C 2018-02-07 2018-03-01 2018-03-29 
4 4 B 2018-02-09 2018-02-20 2018-02-27 
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:06

neilfws

18.6k53749

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

add a comment |

With tidyverse, borrowing @divibisan's sample data :

library(tidyverse)

df %>%
 group_by(Ord_ID, Supplier) %>%
 slice(c(1:2, n())) %>%
 mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
 spread(Trans_Type, Date) %>%
 ungroup()

# # A tibble: 5 x 5
# Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date 
# <int> <fct> <date> <date> <date> 
# 1 1 A 2018-02-15 2018-02-15 2018-02-03
# 2 2 B 2018-03-13 2018-03-13 2018-02-04
# 3 3 C 2018-03-01 2018-03-29 2018-02-07
# 4 4 B 2018-02-20 2018-02-27 2018-02-09
# 5 5 D 2018-04-02 2018-04-02 2018-02-18

If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

1

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

|
show 5 more comments

I would start with something like this:

data %>%
 group_by(Supplier, Trans_Type) %>%
 summarise(min_date = min(Date),
 max_date = max(Date)
) %>%
 ungroup()

Then, you can play with gatherand spread to retrieve the columns you need.

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

add a comment |

Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:

df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
 Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt", 
 "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
 ), Date = structure(c(17565, 17577, 17566, 17603, 17569, 
 17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA, 
-13L), class = "data.frame")



df %>%
 group_by(Ord_ID, Supplier, Trans_Type) %>%
 # Keep only min and max date values
 filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
 # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
 mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
 'Receipt_2', Trans_Type)) %>%
 # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
 ungroup(Trans_Type) %>%
 select(-Trans_Type) %>%
 # Spread the now unduplicated Trans_Type values
 spread(Trans_Type2, Date) %>%
 # Fill in Receipt_2 values where they're missing
 mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

# A tibble: 5 x 5
 Ord_ID Supplier PO Receipt Receipt_2 
 <int> <fct> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15
2 2 B 2018-02-04 2018-03-13 2018-03-13
3 3 C 2018-02-07 2018-03-01 2018-03-29
4 4 B 2018-02-09 2018-02-20 2018-02-27
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:24

divibisan

5,14581834

add a comment |

You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:

test1<-test %>%
 mutate(Date = mdy(Date)) %>%
 group_by(Ord_ID) %>%
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
 filter(!is.na(PO_Date)) %>%
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

A breakdown:

test1<-test %>%

 #convert format of "Date" column to as.Date to identify min and max dates
 mutate(Date = mdy(Date)) %>%

 #group by the Order ID
 group_by(Ord_ID) %>%

 #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
 #dplyr will convert this to numeric, but can be fixed later
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

 #first receipt date is the minimum date of a receipt transaction
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

 #last receipt date is the maximum date of a receipt transaction
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

 #to remove duplicates
 filter(!is.na(PO_Date)) %>%

 #convert "PO_Date" column back to as.Date from numeric
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

answered Mar 21 at 23:21

S. Ash

413

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289624%2forganizing-a-dataframe-splitting-one-column-into-three%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:

library(dplyr)
mydata <- mydata %>% 
 mutate(Date = as.Date(Date, "%m/%d/%y")

Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:

mydata %>% 
 filter(Trans_Type == "Receipt") %>% 
 group_by(Ord_ID, Supplier) %>% 
 summarise(First_Receipt_Date = min(Date), 
 Last_Receipt_Date = max(Date)) %>% 
 ungroup() %>%
 left_join(filter(mydata, Trans_Type == "PO")) %>% 
 select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)

Result:

 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
 <int> <chr> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15 
2 2 B 2018-02-04 2018-03-13 2018-03-13 
3 3 C 2018-02-07 2018-03-01 2018-03-29 
4 4 B 2018-02-09 2018-02-20 2018-02-27 
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:06

neilfws

18.6k53749

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

add a comment |

Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:

library(dplyr)
mydata <- mydata %>% 
 mutate(Date = as.Date(Date, "%m/%d/%y")

Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:

mydata %>% 
 filter(Trans_Type == "Receipt") %>% 
 group_by(Ord_ID, Supplier) %>% 
 summarise(First_Receipt_Date = min(Date), 
 Last_Receipt_Date = max(Date)) %>% 
 ungroup() %>%
 left_join(filter(mydata, Trans_Type == "PO")) %>% 
 select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)

Result:

 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
 <int> <chr> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15 
2 2 B 2018-02-04 2018-03-13 2018-03-13 
3 3 C 2018-02-07 2018-03-01 2018-03-29 
4 4 B 2018-02-09 2018-02-20 2018-02-27 
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:06

neilfws

18.6k53749

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

add a comment |

Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:

library(dplyr)
mydata <- mydata %>% 
 mutate(Date = as.Date(Date, "%m/%d/%y")

Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:

mydata %>% 
 filter(Trans_Type == "Receipt") %>% 
 group_by(Ord_ID, Supplier) %>% 
 summarise(First_Receipt_Date = min(Date), 
 Last_Receipt_Date = max(Date)) %>% 
 ungroup() %>%
 left_join(filter(mydata, Trans_Type == "PO")) %>% 
 select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)

Result:

 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
 <int> <chr> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15 
2 2 B 2018-02-04 2018-03-13 2018-03-13 
3 3 C 2018-02-07 2018-03-01 2018-03-29 
4 4 B 2018-02-09 2018-02-20 2018-02-27 
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:06

neilfws

18.6k53749

Using dplyr. First, make sure column Date is in date format. Assume dataframe is named mydata:

library(dplyr)
mydata <- mydata %>% 
 mutate(Date = as.Date(Date, "%m/%d/%y")

Now you can filter for Receipt, calculate max/min dates, then filter the original data for PO and join them together:

mydata %>% 
 filter(Trans_Type == "Receipt") %>% 
 group_by(Ord_ID, Supplier) %>% 
 summarise(First_Receipt_Date = min(Date), 
 Last_Receipt_Date = max(Date)) %>% 
 ungroup() %>%
 left_join(filter(mydata, Trans_Type == "PO")) %>% 
 select(Ord_ID, Supplier, PO_Date = Date, First_Receipt_Date, Last_Receipt_Date)

Result:

 Ord_ID Supplier PO_Date First_Receipt_Date Last_Receipt_Date
 <int> <chr> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15 
2 2 B 2018-02-04 2018-03-13 2018-03-13 
3 3 C 2018-02-07 2018-03-01 2018-03-29 
4 4 B 2018-02-09 2018-02-20 2018-02-27 
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:06

neilfws

18.6k53749

answered Mar 21 at 22:06

neilfws

18.6k53749

answered Mar 21 at 22:06

neilfws

18.6k53749

answered Mar 21 at 22:06

neilfws

18.6k53749

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

add a comment |

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

When I run this I get this error: "Error: by required, because the data sources have no common variables"

– Millie
Mar 22 at 18:36

Works for me using the example data in the question: the join is on Ord_ID and Supplier.

– neilfws
Mar 23 at 3:06

Got it to work this time. Not sure what was wrong last week; I updated packages earlier which might have done the trick.

– Millie
Mar 25 at 22:05

add a comment |

With tidyverse, borrowing @divibisan's sample data :

library(tidyverse)

df %>%
 group_by(Ord_ID, Supplier) %>%
 slice(c(1:2, n())) %>%
 mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
 spread(Trans_Type, Date) %>%
 ungroup()

# # A tibble: 5 x 5
# Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date 
# <int> <fct> <date> <date> <date> 
# 1 1 A 2018-02-15 2018-02-15 2018-02-03
# 2 2 B 2018-03-13 2018-03-13 2018-02-04
# 3 3 C 2018-03-01 2018-03-29 2018-02-07
# 4 4 B 2018-02-20 2018-02-27 2018-02-09
# 5 5 D 2018-04-02 2018-04-02 2018-02-18

If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

1

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

|
show 5 more comments

With tidyverse, borrowing @divibisan's sample data :

library(tidyverse)

df %>%
 group_by(Ord_ID, Supplier) %>%
 slice(c(1:2, n())) %>%
 mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
 spread(Trans_Type, Date) %>%
 ungroup()

# # A tibble: 5 x 5
# Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date 
# <int> <fct> <date> <date> <date> 
# 1 1 A 2018-02-15 2018-02-15 2018-02-03
# 2 2 B 2018-03-13 2018-03-13 2018-02-04
# 3 3 C 2018-03-01 2018-03-29 2018-02-07
# 4 4 B 2018-02-20 2018-02-27 2018-02-09
# 5 5 D 2018-04-02 2018-04-02 2018-02-18

If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

1

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

|
show 5 more comments

With tidyverse, borrowing @divibisan's sample data :

library(tidyverse)

df %>%
 group_by(Ord_ID, Supplier) %>%
 slice(c(1:2, n())) %>%
 mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
 spread(Trans_Type, Date) %>%
 ungroup()

# # A tibble: 5 x 5
# Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date 
# <int> <fct> <date> <date> <date> 
# 1 1 A 2018-02-15 2018-02-15 2018-02-03
# 2 2 B 2018-03-13 2018-03-13 2018-02-04
# 3 3 C 2018-03-01 2018-03-29 2018-02-07
# 4 4 B 2018-02-20 2018-02-27 2018-02-09
# 5 5 D 2018-04-02 2018-04-02 2018-02-18

If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

With tidyverse, borrowing @divibisan's sample data :

library(tidyverse)

df %>%
 group_by(Ord_ID, Supplier) %>%
 slice(c(1:2, n())) %>%
 mutate(Trans_Type = c("PO_Date","First_Receipt_Date","Last_Receipt_Date")) %>%
 spread(Trans_Type, Date) %>%
 ungroup()

# # A tibble: 5 x 5
# Ord_ID Supplier First_Receipt_Date Last_Receipt_Date PO_Date 
# <int> <fct> <date> <date> <date> 
# 1 1 A 2018-02-15 2018-02-15 2018-02-03
# 2 2 B 2018-03-13 2018-03-13 2018-02-04
# 3 3 C 2018-03-01 2018-03-29 2018-02-07
# 4 4 B 2018-02-20 2018-02-27 2018-02-09
# 5 5 D 2018-04-02 2018-04-02 2018-02-18

If the data is not sorted as in the sample data, add %>% arrange(Trans_Type, Date) as a first step.

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

edited Mar 22 at 17:25

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

answered Mar 22 at 9:58

Moody_Mudskipper

24.7k33570

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

1

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

|
show 5 more comments

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

1

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

I would recommend not to use slice, it is not reproducible as you do not know the order of data. At least, use arrange before. A better way would be a combination of filter with first and last I guess.

– Sébastien Rochette
Mar 22 at 14:32

If data is not sorted, you can add %>% arrange(Trans_Type, Date) as a first step. given the shape of the sample data I assumed it was fair to assume it is sorted. Other assumptions are that there is always a "PO" value, that there are no other values than "PO" and "Receipt", that order of the output columns wasn't important etc...

– Moody_Mudskipper
Mar 22 at 17:15

I don't understand the point on filter with first and last, I use slice in its precise intended use case IMO.

– Moody_Mudskipper
Mar 22 at 17:18

I know this is correct with your assumptions in this specific case. But because you never know how a dataset is built, nor can you know how it will be later updated, I do not recommend the use of indices for the selection/filtering of datasets. There must be a better explanation, included in the data, as why you chose these specific lines. Here, these are the smallest and the biggest values. I try to always think about the future use of my scripts. This is a personal recommendation.

– Sébastien Rochette
Mar 22 at 17:22

That's a legitimate point, but there's also value in concise code, and on SO you're rarely 100% explicit about assumptions anyway so it's a gray area. Actually my first answer had the arrange part but I edited it out (not showing in edit history as i edited right away). I added a note at the end of my post as a compromise :).

– Moody_Mudskipper
Mar 22 at 17:29

|
show 5 more comments

I would start with something like this:

data %>%
 group_by(Supplier, Trans_Type) %>%
 summarise(min_date = min(Date),
 max_date = max(Date)
) %>%
 ungroup()

Then, you can play with gatherand spread to retrieve the columns you need.

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

add a comment |

I would start with something like this:

data %>%
 group_by(Supplier, Trans_Type) %>%
 summarise(min_date = min(Date),
 max_date = max(Date)
) %>%
 ungroup()

Then, you can play with gatherand spread to retrieve the columns you need.

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

add a comment |

I would start with something like this:

data %>%
 group_by(Supplier, Trans_Type) %>%
 summarise(min_date = min(Date),
 max_date = max(Date)
) %>%
 ungroup()

Then, you can play with gatherand spread to retrieve the columns you need.

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

I would start with something like this:

data %>%
 group_by(Supplier, Trans_Type) %>%
 summarise(min_date = min(Date),
 max_date = max(Date)
) %>%
 ungroup()

Then, you can play with gatherand spread to retrieve the columns you need.

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

answered Mar 21 at 21:55

Sébastien Rochette

4,3232929

add a comment |

Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:

df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
 Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt", 
 "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
 ), Date = structure(c(17565, 17577, 17566, 17603, 17569, 
 17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA, 
-13L), class = "data.frame")



df %>%
 group_by(Ord_ID, Supplier, Trans_Type) %>%
 # Keep only min and max date values
 filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
 # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
 mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
 'Receipt_2', Trans_Type)) %>%
 # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
 ungroup(Trans_Type) %>%
 select(-Trans_Type) %>%
 # Spread the now unduplicated Trans_Type values
 spread(Trans_Type2, Date) %>%
 # Fill in Receipt_2 values where they're missing
 mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

# A tibble: 5 x 5
 Ord_ID Supplier PO Receipt Receipt_2 
 <int> <fct> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15
2 2 B 2018-02-04 2018-03-13 2018-03-13
3 3 C 2018-02-07 2018-03-01 2018-03-29
4 4 B 2018-02-09 2018-02-20 2018-02-27
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:24

divibisan

5,14581834

add a comment |

Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:

df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
 Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt", 
 "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
 ), Date = structure(c(17565, 17577, 17566, 17603, 17569, 
 17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA, 
-13L), class = "data.frame")



df %>%
 group_by(Ord_ID, Supplier, Trans_Type) %>%
 # Keep only min and max date values
 filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
 # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
 mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
 'Receipt_2', Trans_Type)) %>%
 # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
 ungroup(Trans_Type) %>%
 select(-Trans_Type) %>%
 # Spread the now unduplicated Trans_Type values
 spread(Trans_Type2, Date) %>%
 # Fill in Receipt_2 values where they're missing
 mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

# A tibble: 5 x 5
 Ord_ID Supplier PO Receipt Receipt_2 
 <int> <fct> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15
2 2 B 2018-02-04 2018-03-13 2018-03-13
3 3 C 2018-02-07 2018-03-01 2018-03-29
4 4 B 2018-02-09 2018-02-20 2018-02-27
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:24

divibisan

5,14581834

add a comment |

Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:

df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
 Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt", 
 "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
 ), Date = structure(c(17565, 17577, 17566, 17603, 17569, 
 17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA, 
-13L), class = "data.frame")



df %>%
 group_by(Ord_ID, Supplier, Trans_Type) %>%
 # Keep only min and max date values
 filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
 # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
 mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
 'Receipt_2', Trans_Type)) %>%
 # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
 ungroup(Trans_Type) %>%
 select(-Trans_Type) %>%
 # Spread the now unduplicated Trans_Type values
 spread(Trans_Type2, Date) %>%
 # Fill in Receipt_2 values where they're missing
 mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

# A tibble: 5 x 5
 Ord_ID Supplier PO Receipt Receipt_2 
 <int> <fct> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15
2 2 B 2018-02-04 2018-03-13 2018-03-13
3 3 C 2018-02-07 2018-03-01 2018-03-29
4 4 B 2018-02-09 2018-02-20 2018-02-27
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:24

divibisan

5,14581834

Here's another tidyverse based solution that avoids the left_join. I have no idea which approach would be faster on a large dataset, but it's always good to have more options:

df <- structure(list(Ord_ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L), Supplier = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), 
 Trans_Type = c("PO", "Receipt", "PO", "Receipt", "PO", "Receipt", 
 "Receipt", "Receipt", "PO", "Receipt", "Receipt", "PO", "Receipt"
 ), Date = structure(c(17565, 17577, 17566, 17603, 17569, 
 17591, 17595, 17619, 17571, 17582, 17589, 17580, 17623), class = "Date")), row.names = c(NA, 
-13L), class = "data.frame")



df %>%
 group_by(Ord_ID, Supplier, Trans_Type) %>%
 # Keep only min and max date values
 filter(Date == min(Date) | Date == max(Date) | Trans_Type != 'Receipt') %>%
 # Rename 2nd Receipt value Receipt_2 so there are no duplicated values
 mutate(Trans_Type2 = if_else(Trans_Type == 'Receipt' & row_number() == 2,
 'Receipt_2', Trans_Type)) %>%
 # Drop Trans_Type variable (we can't replace in mutate since it's a grouping var)
 ungroup(Trans_Type) %>%
 select(-Trans_Type) %>%
 # Spread the now unduplicated Trans_Type values
 spread(Trans_Type2, Date) %>%
 # Fill in Receipt_2 values where they're missing
 mutate(Receipt_2 = if_else(is.na(Receipt_2), Receipt, Receipt_2))

# A tibble: 5 x 5
 Ord_ID Supplier PO Receipt Receipt_2 
 <int> <fct> <date> <date> <date> 
1 1 A 2018-02-03 2018-02-15 2018-02-15
2 2 B 2018-02-04 2018-03-13 2018-03-13
3 3 C 2018-02-07 2018-03-01 2018-03-29
4 4 B 2018-02-09 2018-02-20 2018-02-27
5 5 D 2018-02-18 2018-04-02 2018-04-02

answered Mar 21 at 22:24

divibisan

5,14581834

answered Mar 21 at 22:24

divibisan

5,14581834

answered Mar 21 at 22:24

divibisan

5,14581834

answered Mar 21 at 22:24

divibisan

5,14581834

add a comment |

You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:

test1<-test %>%
 mutate(Date = mdy(Date)) %>%
 group_by(Ord_ID) %>%
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
 filter(!is.na(PO_Date)) %>%
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

A breakdown:

test1<-test %>%

 #convert format of "Date" column to as.Date to identify min and max dates
 mutate(Date = mdy(Date)) %>%

 #group by the Order ID
 group_by(Ord_ID) %>%

 #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
 #dplyr will convert this to numeric, but can be fixed later
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

 #first receipt date is the minimum date of a receipt transaction
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

 #last receipt date is the maximum date of a receipt transaction
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

 #to remove duplicates
 filter(!is.na(PO_Date)) %>%

 #convert "PO_Date" column back to as.Date from numeric
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

answered Mar 21 at 23:21

S. Ash

413

add a comment |

You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:

test1<-test %>%
 mutate(Date = mdy(Date)) %>%
 group_by(Ord_ID) %>%
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
 filter(!is.na(PO_Date)) %>%
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

A breakdown:

test1<-test %>%

 #convert format of "Date" column to as.Date to identify min and max dates
 mutate(Date = mdy(Date)) %>%

 #group by the Order ID
 group_by(Ord_ID) %>%

 #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
 #dplyr will convert this to numeric, but can be fixed later
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

 #first receipt date is the minimum date of a receipt transaction
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

 #last receipt date is the maximum date of a receipt transaction
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

 #to remove duplicates
 filter(!is.na(PO_Date)) %>%

 #convert "PO_Date" column back to as.Date from numeric
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

answered Mar 21 at 23:21

S. Ash

413

add a comment |

You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:

test1<-test %>%
 mutate(Date = mdy(Date)) %>%
 group_by(Ord_ID) %>%
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
 filter(!is.na(PO_Date)) %>%
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

A breakdown:

test1<-test %>%

 #convert format of "Date" column to as.Date to identify min and max dates
 mutate(Date = mdy(Date)) %>%

 #group by the Order ID
 group_by(Ord_ID) %>%

 #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
 #dplyr will convert this to numeric, but can be fixed later
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

 #first receipt date is the minimum date of a receipt transaction
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

 #last receipt date is the maximum date of a receipt transaction
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

 #to remove duplicates
 filter(!is.na(PO_Date)) %>%

 #convert "PO_Date" column back to as.Date from numeric
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

answered Mar 21 at 23:21

S. Ash

413

You can just use dplyr to mutate new columns for PO date, and first and last receipt dates:

test1<-test %>%
 mutate(Date = mdy(Date)) %>%
 group_by(Ord_ID) %>%
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%
 filter(!is.na(PO_Date)) %>%
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

A breakdown:

test1<-test %>%

 #convert format of "Date" column to as.Date to identify min and max dates
 mutate(Date = mdy(Date)) %>%

 #group by the Order ID
 group_by(Ord_ID) %>%

 #PO_Date will be where the "Trans_Type" is "PO" --> since the column is in date format,
 #dplyr will convert this to numeric, but can be fixed later
 mutate(PO_Date = ifelse(Trans_Type == "PO", Date, NA),

 #first receipt date is the minimum date of a receipt transaction
 Receipt_Date_First = min(Date[Trans_Type=="Receipt"]),

 #last receipt date is the maximum date of a receipt transaction
 Receipt_Date_Last = max(Date[Trans_Type=="Receipt"])) %>%

 #to remove duplicates
 filter(!is.na(PO_Date)) %>%

 #convert "PO_Date" column back to as.Date from numeric
 mutate(PO_Date = as.Date(as.numeric(PO_Date)))

answered Mar 21 at 23:21

S. Ash

413

answered Mar 21 at 23:21

S. Ash

413

answered Mar 21 at 23:21

S. Ash

413

answered Mar 21 at 23:21

S. Ash

413

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

5 Answers
5

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

5 Answers
5

5 Answers
5

5 Answers
5