R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns

Given a specific computer system, is it possible to estimate the actual precise run time of a piece of Assembly code

Sum and average calculator

Heavy Box Stacking

How to draw a true pie chart?

How to run a command 1 out of N times in Bash

How can I store milk for long periods of time?

Turn off Google Chrome's Notification for "Flash Player will no longer be supported after December 2020."

Am I required to correct my opponent's assumptions about my morph creatures?

How can I portray a character with no fear of death, without them sounding utterly bored?

Can you use Apple Care+ without any checks (bringing just MacBook)?

LINQ Extension methods MinBy and MaxBy

Received email from ISP saying one of my devices has malware

awk print conditions

Confidence intervals for the mean of a sample of counts

Where should I draw the line on follow up questions from previous employer

How did the Altair 8800 front panel load the program counter?

'spazieren' - walking in a silly and affected manner?

Modeling an M1A2 Smoke Grenade Launcher

Why do presidential pardons exist in a country having a clear separation of powers?

Moscow SVO airport, how to avoid scam taxis without pre-booking?

Fishing from underwater domes

Unreadable lines of Milnor's book

Break down the phrase "shitsurei shinakereba naranaindesu"

What is the practical impact of using System.Random which is not cryptographically random?

R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0

Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

thank you in advance for anyone who is going to try and help with this.

I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"

The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.

My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.

# an example of what I am using #
df <- 
 data_frame(business_id = c("bus_1",
 "bus_2", 
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"))

# what I want it to look like #
desired_df <- 
 data_frame(business_id = c("bus_1",
 "bus_2",
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"),
 Pizza = c(1, 1, 0),
 Burgers = c(1, 0, 0),
 Caterers = c(1, 0, 0),
 Restaurants = c(0, 1, 1),
 Bars = c(0, 1, 0),
 American = c(0, 0, 1),
 Barbeque = c(0, 0, 1))

# where I am stuck #
df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
 model.matrix(business_id ~ categories, data = .) %>% 
 as_data_frame

Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.

# duplicating the error with a smaller data.frame #

library(tidyverse)
 df <- structure(list(age = c("21", "17", "32", "29", "15"), 
 gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), 
 row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
 df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct> 
#> 1 21 Male 
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male 
#> 5 15 Male 

df %>% 
 spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)

# fixing the problem #

df %>% 
 group_by_at(vars(-age)) %>% # group by everything other than the value column. 
 mutate(row_id=1:n()) %>% ungroup() %>% # build group index
 spread(key=gender, value=age) %>% # spread
 select(-row_id) # drop the index

#> # A tibble: 3 x 2
#> Female Male 
#> <chr> <chr>
#> 1 17 21 
#> 2 32 29 
#> 3 NA 15

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

add a comment |

thank you in advance for anyone who is going to try and help with this.

I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"

# an example of what I am using #
df <- 
 data_frame(business_id = c("bus_1",
 "bus_2", 
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"))

# what I want it to look like #
desired_df <- 
 data_frame(business_id = c("bus_1",
 "bus_2",
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"),
 Pizza = c(1, 1, 0),
 Burgers = c(1, 0, 0),
 Caterers = c(1, 0, 0),
 Restaurants = c(0, 1, 1),
 Bars = c(0, 1, 0),
 American = c(0, 0, 1),
 Barbeque = c(0, 0, 1))

# where I am stuck #
df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
 model.matrix(business_id ~ categories, data = .) %>% 
 as_data_frame

# duplicating the error with a smaller data.frame #

library(tidyverse)
 df <- structure(list(age = c("21", "17", "32", "29", "15"), 
 gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), 
 row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
 df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct> 
#> 1 21 Male 
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male 
#> 5 15 Male 

df %>% 
 spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)

# fixing the problem #

df %>% 
 group_by_at(vars(-age)) %>% # group by everything other than the value column. 
 mutate(row_id=1:n()) %>% ungroup() %>% # build group index
 spread(key=gender, value=age) %>% # spread
 select(-row_id) # drop the index

#> # A tibble: 3 x 2
#> Female Male 
#> <chr> <chr>
#> 1 17 21 
#> 2 32 29 
#> 3 NA 15

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

add a comment |

thank you in advance for anyone who is going to try and help with this.

I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"

# an example of what I am using #
df <- 
 data_frame(business_id = c("bus_1",
 "bus_2", 
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"))

# what I want it to look like #
desired_df <- 
 data_frame(business_id = c("bus_1",
 "bus_2",
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"),
 Pizza = c(1, 1, 0),
 Burgers = c(1, 0, 0),
 Caterers = c(1, 0, 0),
 Restaurants = c(0, 1, 1),
 Bars = c(0, 1, 0),
 American = c(0, 0, 1),
 Barbeque = c(0, 0, 1))

# where I am stuck #
df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
 model.matrix(business_id ~ categories, data = .) %>% 
 as_data_frame

# duplicating the error with a smaller data.frame #

library(tidyverse)
 df <- structure(list(age = c("21", "17", "32", "29", "15"), 
 gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), 
 row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
 df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct> 
#> 1 21 Male 
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male 
#> 5 15 Male 

df %>% 
 spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)

# fixing the problem #

df %>% 
 group_by_at(vars(-age)) %>% # group by everything other than the value column. 
 mutate(row_id=1:n()) %>% ungroup() %>% # build group index
 spread(key=gender, value=age) %>% # spread
 select(-row_id) # drop the index

#> # A tibble: 3 x 2
#> Female Male 
#> <chr> <chr>
#> 1 17 21 
#> 2 32 29 
#> 3 NA 15

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

thank you in advance for anyone who is going to try and help with this.

I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"

# an example of what I am using #
df <- 
 data_frame(business_id = c("bus_1",
 "bus_2", 
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"))

# what I want it to look like #
desired_df <- 
 data_frame(business_id = c("bus_1",
 "bus_2",
 "bus_3"),
 categories=c("Pizza, Burgers, Caterers",
 "Pizza, Restaurants, Bars",
 "American, Barbeque, Restaurants"),
 Pizza = c(1, 1, 0),
 Burgers = c(1, 0, 0),
 Caterers = c(1, 0, 0),
 Restaurants = c(0, 1, 1),
 Bars = c(0, 1, 0),
 American = c(0, 0, 1),
 Barbeque = c(0, 0, 1))

# where I am stuck #
df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
 model.matrix(business_id ~ categories, data = .) %>% 
 as_data_frame

# duplicating the error with a smaller data.frame #

library(tidyverse)
 df <- structure(list(age = c("21", "17", "32", "29", "15"), 
 gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")), 
 row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
 df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct> 
#> 1 21 Male 
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male 
#> 5 15 Male 

df %>% 
 spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)

# fixing the problem #

df %>% 
 group_by_at(vars(-age)) %>% # group by everything other than the value column. 
 mutate(row_id=1:n()) %>% ungroup() %>% # build group index
 spread(key=gender, value=age) %>% # spread
 select(-row_id) # drop the index

#> # A tibble: 3 x 2
#> Female Male 
#> <chr> <chr>
#> 1 17 21 
#> 2 32 29 
#> 3 NA 15

r dataframe yelp

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

edited Apr 1 at 20:15

asked Mar 27 at 23:54

phillip perin

83 bronze badges

asked Mar 27 at 23:54

phillip perin

83 bronze badges

asked Mar 27 at 23:54

phillip perin

83 bronze badges

add a comment |

2 Answers
2

active

oldest

votes

Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution

library(dplyr)
library(tidyr)
library(tidytext)

df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>% 
 mutate(value = 1) %>% 
 spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

add a comment |

Here is a simple tidyverse solution:

library(tidyverse)

df %>% 
 mutate(
 ind = 1,
 tmp = strsplit(categories, ", ")
 ) %>% 
 unnest(tmp) %>% 
 spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388205%2fr-yelp-data-business-category-column-has-multiple-categories-per-business-wan%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution

library(dplyr)
library(tidyr)
library(tidytext)

df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>% 
 mutate(value = 1) %>% 
 spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

add a comment |

Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution

library(dplyr)
library(tidyr)
library(tidytext)

df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>% 
 mutate(value = 1) %>% 
 spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

add a comment |

Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution

library(dplyr)
library(tidyr)
library(tidytext)

df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>% 
 mutate(value = 1) %>% 
 spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution

library(dplyr)
library(tidyr)
library(tidytext)

df %>%
 select(business_id, categories) %>% 
 group_by(business_id) %>% 
 unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>% 
 mutate(value = 1) %>% 
 spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

answered Mar 28 at 0:31

Recle Vibal

1013 bronze badges

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

add a comment |

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

This is great, thank you so much!

– phillip perin
Mar 28 at 7:32

Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

– phillip perin
Apr 1 at 20:12

Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

– Recle Vibal
Apr 2 at 19:02

add a comment |

Here is a simple tidyverse solution:

library(tidyverse)

df %>% 
 mutate(
 ind = 1,
 tmp = strsplit(categories, ", ")
 ) %>% 
 unnest(tmp) %>% 
 spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

add a comment |

Here is a simple tidyverse solution:

library(tidyverse)

df %>% 
 mutate(
 ind = 1,
 tmp = strsplit(categories, ", ")
 ) %>% 
 unnest(tmp) %>% 
 spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

add a comment |

Here is a simple tidyverse solution:

library(tidyverse)

df %>% 
 mutate(
 ind = 1,
 tmp = strsplit(categories, ", ")
 ) %>% 
 unnest(tmp) %>% 
 spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

Here is a simple tidyverse solution:

library(tidyverse)

df %>% 
 mutate(
 ind = 1,
 tmp = strsplit(categories, ", ")
 ) %>% 
 unnest(tmp) %>% 
 spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

answered Mar 28 at 0:19

DiceboyT

1,7651 silver badge14 bronze badges

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

add a comment |

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

Thank you very much for this condensed answer!

– phillip perin
Mar 28 at 19:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2