R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns
Given a specific computer system, is it possible to estimate the actual precise run time of a piece of Assembly code
Sum and average calculator
Heavy Box Stacking
How to draw a true pie chart?
How to run a command 1 out of N times in Bash
How can I store milk for long periods of time?
Turn off Google Chrome's Notification for "Flash Player will no longer be supported after December 2020."
Am I required to correct my opponent's assumptions about my morph creatures?
How can I portray a character with no fear of death, without them sounding utterly bored?
Can you use Apple Care+ without any checks (bringing just MacBook)?
LINQ Extension methods MinBy and MaxBy
Received email from ISP saying one of my devices has malware
awk print conditions
Confidence intervals for the mean of a sample of counts
Where should I draw the line on follow up questions from previous employer
How did the Altair 8800 front panel load the program counter?
'spazieren' - walking in a silly and affected manner?
Modeling an M1A2 Smoke Grenade Launcher
Why do presidential pardons exist in a country having a clear separation of powers?
Moscow SVO airport, how to avoid scam taxis without pre-booking?
Fishing from underwater domes
Unreadable lines of Milnor's book
Break down the phrase "shitsurei shinakereba naranaindesu"
What is the practical impact of using System.Random which is not cryptographically random?
R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0
Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
thank you in advance for anyone who is going to try and help with this.
I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"
The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.
My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.
# an example of what I am using #
df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"))
# what I want it to look like #
desired_df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"),
Pizza = c(1, 1, 0),
Burgers = c(1, 0, 0),
Caterers = c(1, 0, 0),
Restaurants = c(0, 1, 1),
Bars = c(0, 1, 0),
American = c(0, 0, 1),
Barbeque = c(0, 0, 1))
# where I am stuck #
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
model.matrix(business_id ~ categories, data = .) %>%
as_data_frame
Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.
# duplicating the error with a smaller data.frame #
library(tidyverse)
df <- structure(list(age = c("21", "17", "32", "29", "15"),
gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct>
#> 1 21 Male
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male
#> 5 15 Male
df %>%
spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)
# fixing the problem #
df %>%
group_by_at(vars(-age)) %>% # group by everything other than the value column.
mutate(row_id=1:n()) %>% ungroup() %>% # build group index
spread(key=gender, value=age) %>% # spread
select(-row_id) # drop the index
#> # A tibble: 3 x 2
#> Female Male
#> <chr> <chr>
#> 1 17 21
#> 2 32 29
#> 3 NA 15
r dataframe yelp
add a comment |
thank you in advance for anyone who is going to try and help with this.
I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"
The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.
My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.
# an example of what I am using #
df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"))
# what I want it to look like #
desired_df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"),
Pizza = c(1, 1, 0),
Burgers = c(1, 0, 0),
Caterers = c(1, 0, 0),
Restaurants = c(0, 1, 1),
Bars = c(0, 1, 0),
American = c(0, 0, 1),
Barbeque = c(0, 0, 1))
# where I am stuck #
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
model.matrix(business_id ~ categories, data = .) %>%
as_data_frame
Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.
# duplicating the error with a smaller data.frame #
library(tidyverse)
df <- structure(list(age = c("21", "17", "32", "29", "15"),
gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct>
#> 1 21 Male
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male
#> 5 15 Male
df %>%
spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)
# fixing the problem #
df %>%
group_by_at(vars(-age)) %>% # group by everything other than the value column.
mutate(row_id=1:n()) %>% ungroup() %>% # build group index
spread(key=gender, value=age) %>% # spread
select(-row_id) # drop the index
#> # A tibble: 3 x 2
#> Female Male
#> <chr> <chr>
#> 1 17 21
#> 2 32 29
#> 3 NA 15
r dataframe yelp
add a comment |
thank you in advance for anyone who is going to try and help with this.
I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"
The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.
My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.
# an example of what I am using #
df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"))
# what I want it to look like #
desired_df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"),
Pizza = c(1, 1, 0),
Burgers = c(1, 0, 0),
Caterers = c(1, 0, 0),
Restaurants = c(0, 1, 1),
Bars = c(0, 1, 0),
American = c(0, 0, 1),
Barbeque = c(0, 0, 1))
# where I am stuck #
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
model.matrix(business_id ~ categories, data = .) %>%
as_data_frame
Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.
# duplicating the error with a smaller data.frame #
library(tidyverse)
df <- structure(list(age = c("21", "17", "32", "29", "15"),
gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct>
#> 1 21 Male
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male
#> 5 15 Male
df %>%
spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)
# fixing the problem #
df %>%
group_by_at(vars(-age)) %>% # group by everything other than the value column.
mutate(row_id=1:n()) %>% ungroup() %>% # build group index
spread(key=gender, value=age) %>% # spread
select(-row_id) # drop the index
#> # A tibble: 3 x 2
#> Female Male
#> <chr> <chr>
#> 1 17 21
#> 2 32 29
#> 3 NA 15
r dataframe yelp
thank you in advance for anyone who is going to try and help with this.
I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"
The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.
My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.
# an example of what I am using #
df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"))
# what I want it to look like #
desired_df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"),
Pizza = c(1, 1, 0),
Burgers = c(1, 0, 0),
Caterers = c(1, 0, 0),
Restaurants = c(0, 1, 1),
Bars = c(0, 1, 0),
American = c(0, 0, 1),
Barbeque = c(0, 0, 1))
# where I am stuck #
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
model.matrix(business_id ~ categories, data = .) %>%
as_data_frame
Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.
# duplicating the error with a smaller data.frame #
library(tidyverse)
df <- structure(list(age = c("21", "17", "32", "29", "15"),
gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct>
#> 1 21 Male
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male
#> 5 15 Male
df %>%
spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)
# fixing the problem #
df %>%
group_by_at(vars(-age)) %>% # group by everything other than the value column.
mutate(row_id=1:n()) %>% ungroup() %>% # build group index
spread(key=gender, value=age) %>% # spread
select(-row_id) # drop the index
#> # A tibble: 3 x 2
#> Female Male
#> <chr> <chr>
#> 1 17 21
#> 2 32 29
#> 3 NA 15
r dataframe yelp
r dataframe yelp
edited Apr 1 at 20:15
phillip perin
asked Mar 27 at 23:54
phillip perinphillip perin
83 bronze badges
83 bronze badges
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Building from your nice use of tidytext::unnest_tokens()
, you can also use this alternative solution
library(dplyr)
library(tidyr)
library(tidytext)
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
mutate(value = 1) %>%
spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggestcdata
for your data transformation needs. However,spread()
fits much better to your current workflow.
– Recle Vibal
Apr 2 at 19:02
add a comment |
Here is a simple tidyverse solution:
library(tidyverse)
df %>%
mutate(
ind = 1,
tmp = strsplit(categories, ", ")
) %>%
unnest(tmp) %>%
spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388205%2fr-yelp-data-business-category-column-has-multiple-categories-per-business-wan%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Building from your nice use of tidytext::unnest_tokens()
, you can also use this alternative solution
library(dplyr)
library(tidyr)
library(tidytext)
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
mutate(value = 1) %>%
spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggestcdata
for your data transformation needs. However,spread()
fits much better to your current workflow.
– Recle Vibal
Apr 2 at 19:02
add a comment |
Building from your nice use of tidytext::unnest_tokens()
, you can also use this alternative solution
library(dplyr)
library(tidyr)
library(tidytext)
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
mutate(value = 1) %>%
spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggestcdata
for your data transformation needs. However,spread()
fits much better to your current workflow.
– Recle Vibal
Apr 2 at 19:02
add a comment |
Building from your nice use of tidytext::unnest_tokens()
, you can also use this alternative solution
library(dplyr)
library(tidyr)
library(tidytext)
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
mutate(value = 1) %>%
spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1
Building from your nice use of tidytext::unnest_tokens()
, you can also use this alternative solution
library(dplyr)
library(tidyr)
library(tidytext)
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
mutate(value = 1) %>%
spread(categories, value, fill = 0)
# business_id american barbeque bars burgers caterers pizza restaurants
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# bus_1 0 0 0 1 1 1 0
# bus_2 0 0 1 0 0 1 1
# bus_3 1 1 0 0 0 0 1
answered Mar 28 at 0:31
Recle VibalRecle Vibal
1013 bronze badges
1013 bronze badges
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggestcdata
for your data transformation needs. However,spread()
fits much better to your current workflow.
– Recle Vibal
Apr 2 at 19:02
add a comment |
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggestcdata
for your data transformation needs. However,spread()
fits much better to your current workflow.
– Recle Vibal
Apr 2 at 19:02
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
This is great, thank you so much!
– phillip perin
Mar 28 at 7:32
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.
– phillip perin
Apr 1 at 20:12
Ah yes,
spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata
for your data transformation needs. However, spread()
fits much better to your current workflow.– Recle Vibal
Apr 2 at 19:02
Ah yes,
spread()
can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata
for your data transformation needs. However, spread()
fits much better to your current workflow.– Recle Vibal
Apr 2 at 19:02
add a comment |
Here is a simple tidyverse solution:
library(tidyverse)
df %>%
mutate(
ind = 1,
tmp = strsplit(categories, ", ")
) %>%
unnest(tmp) %>%
spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
add a comment |
Here is a simple tidyverse solution:
library(tidyverse)
df %>%
mutate(
ind = 1,
tmp = strsplit(categories, ", ")
) %>%
unnest(tmp) %>%
spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
add a comment |
Here is a simple tidyverse solution:
library(tidyverse)
df %>%
mutate(
ind = 1,
tmp = strsplit(categories, ", ")
) %>%
unnest(tmp) %>%
spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1
Here is a simple tidyverse solution:
library(tidyverse)
df %>%
mutate(
ind = 1,
tmp = strsplit(categories, ", ")
) %>%
unnest(tmp) %>%
spread(tmp, ind, fill = 0)
## A tibble: 3 x 9
# business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
#2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
#3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1
answered Mar 28 at 0:19
DiceboyTDiceboyT
1,7651 silver badge14 bronze badges
1,7651 silver badge14 bronze badges
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
add a comment |
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
Thank you very much for this condensed answer!
– phillip perin
Mar 28 at 19:40
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388205%2fr-yelp-data-business-category-column-has-multiple-categories-per-business-wan%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown