R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns

Given a specific computer system, is it possible to estimate the actual precise run time of a piece of Assembly code

Sum and average calculator

Heavy Box Stacking

How to draw a true pie chart?

How to run a command 1 out of N times in Bash

How can I store milk for long periods of time?

Turn off Google Chrome's Notification for "Flash Player will no longer be supported after December 2020."

Am I required to correct my opponent's assumptions about my morph creatures?

How can I portray a character with no fear of death, without them sounding utterly bored?

Can you use Apple Care+ without any checks (bringing just MacBook)?

LINQ Extension methods MinBy and MaxBy

Received email from ISP saying one of my devices has malware

awk print conditions

Confidence intervals for the mean of a sample of counts

Where should I draw the line on follow up questions from previous employer

How did the Altair 8800 front panel load the program counter?

'spazieren' - walking in a silly and affected manner?

Modeling an M1A2 Smoke Grenade Launcher

Why do presidential pardons exist in a country having a clear separation of powers?

Moscow SVO airport, how to avoid scam taxis without pre-booking?

Fishing from underwater domes

Unreadable lines of Milnor's book

Break down the phrase "shitsurei shinakereba naranaindesu"

What is the practical impact of using System.Random which is not cryptographically random?



R - Yelp data Business category column has multiple categories per business. Want to separate into category specific columns with values of 1 and 0


Split data frame string column into multiple columnsFiltering a data frame by values in a columnExtracting specific columns from a data frameany way to get business data using URL of yelp?Summarize the data per column and per rowGet multiple business images (image_url) from yelp apiHow to retrieve data items individually from Yelp Business API 2.0?HTML scraping - How to find business category in a yelp business page?Yelp Fusion API - Pulling Specific Business Data Using PHP And JSONSeparate data based on categories across columns






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















thank you in advance for anyone who is going to try and help with this.



I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"



The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.



My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.



# an example of what I am using #
df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"))

# what I want it to look like #
desired_df <-
data_frame(business_id = c("bus_1",
"bus_2",
"bus_3"),
categories=c("Pizza, Burgers, Caterers",
"Pizza, Restaurants, Bars",
"American, Barbeque, Restaurants"),
Pizza = c(1, 1, 0),
Burgers = c(1, 0, 0),
Caterers = c(1, 0, 0),
Restaurants = c(0, 1, 1),
Bars = c(0, 1, 0),
American = c(0, 0, 1),
Barbeque = c(0, 0, 1))

# where I am stuck #
df %>%
select(business_id, categories) %>%
group_by(business_id) %>%
unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
model.matrix(business_id ~ categories, data = .) %>%
as_data_frame


Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.



# duplicating the error with a smaller data.frame #



library(tidyverse)
df <- structure(list(age = c("21", "17", "32", "29", "15"),
gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
df
#> # A tibble: 5 x 2
#> age gender
#> <chr> <fct>
#> 1 21 Male
#> 2 17 Female
#> 3 32 Female
#> 4 29 Male
#> 5 15 Male

df %>%
spread(key=gender, value=age)
#> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)


# fixing the problem #



df %>% 
group_by_at(vars(-age)) %>% # group by everything other than the value column.
mutate(row_id=1:n()) %>% ungroup() %>% # build group index
spread(key=gender, value=age) %>% # spread
select(-row_id) # drop the index

#> # A tibble: 3 x 2
#> Female Male
#> <chr> <chr>
#> 1 17 21
#> 2 32 29
#> 3 NA 15









share|improve this question
































    1















    thank you in advance for anyone who is going to try and help with this.



    I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"



    The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.



    My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.



    # an example of what I am using #
    df <-
    data_frame(business_id = c("bus_1",
    "bus_2",
    "bus_3"),
    categories=c("Pizza, Burgers, Caterers",
    "Pizza, Restaurants, Bars",
    "American, Barbeque, Restaurants"))

    # what I want it to look like #
    desired_df <-
    data_frame(business_id = c("bus_1",
    "bus_2",
    "bus_3"),
    categories=c("Pizza, Burgers, Caterers",
    "Pizza, Restaurants, Bars",
    "American, Barbeque, Restaurants"),
    Pizza = c(1, 1, 0),
    Burgers = c(1, 0, 0),
    Caterers = c(1, 0, 0),
    Restaurants = c(0, 1, 1),
    Bars = c(0, 1, 0),
    American = c(0, 0, 1),
    Barbeque = c(0, 0, 1))

    # where I am stuck #
    df %>%
    select(business_id, categories) %>%
    group_by(business_id) %>%
    unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
    model.matrix(business_id ~ categories, data = .) %>%
    as_data_frame


    Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.



    # duplicating the error with a smaller data.frame #



    library(tidyverse)
    df <- structure(list(age = c("21", "17", "32", "29", "15"),
    gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
    row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
    df
    #> # A tibble: 5 x 2
    #> age gender
    #> <chr> <fct>
    #> 1 21 Male
    #> 2 17 Female
    #> 3 32 Female
    #> 4 29 Male
    #> 5 15 Male

    df %>%
    spread(key=gender, value=age)
    #> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)


    # fixing the problem #



    df %>% 
    group_by_at(vars(-age)) %>% # group by everything other than the value column.
    mutate(row_id=1:n()) %>% ungroup() %>% # build group index
    spread(key=gender, value=age) %>% # spread
    select(-row_id) # drop the index

    #> # A tibble: 3 x 2
    #> Female Male
    #> <chr> <chr>
    #> 1 17 21
    #> 2 32 29
    #> 3 NA 15









    share|improve this question




























      1












      1








      1








      thank you in advance for anyone who is going to try and help with this.



      I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"



      The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.



      My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.



      # an example of what I am using #
      df <-
      data_frame(business_id = c("bus_1",
      "bus_2",
      "bus_3"),
      categories=c("Pizza, Burgers, Caterers",
      "Pizza, Restaurants, Bars",
      "American, Barbeque, Restaurants"))

      # what I want it to look like #
      desired_df <-
      data_frame(business_id = c("bus_1",
      "bus_2",
      "bus_3"),
      categories=c("Pizza, Burgers, Caterers",
      "Pizza, Restaurants, Bars",
      "American, Barbeque, Restaurants"),
      Pizza = c(1, 1, 0),
      Burgers = c(1, 0, 0),
      Caterers = c(1, 0, 0),
      Restaurants = c(0, 1, 1),
      Bars = c(0, 1, 0),
      American = c(0, 0, 1),
      Barbeque = c(0, 0, 1))

      # where I am stuck #
      df %>%
      select(business_id, categories) %>%
      group_by(business_id) %>%
      unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
      model.matrix(business_id ~ categories, data = .) %>%
      as_data_frame


      Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.



      # duplicating the error with a smaller data.frame #



      library(tidyverse)
      df <- structure(list(age = c("21", "17", "32", "29", "15"),
      gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
      row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
      df
      #> # A tibble: 5 x 2
      #> age gender
      #> <chr> <fct>
      #> 1 21 Male
      #> 2 17 Female
      #> 3 32 Female
      #> 4 29 Male
      #> 5 15 Male

      df %>%
      spread(key=gender, value=age)
      #> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)


      # fixing the problem #



      df %>% 
      group_by_at(vars(-age)) %>% # group by everything other than the value column.
      mutate(row_id=1:n()) %>% ungroup() %>% # build group index
      spread(key=gender, value=age) %>% # spread
      select(-row_id) # drop the index

      #> # A tibble: 3 x 2
      #> Female Male
      #> <chr> <chr>
      #> 1 17 21
      #> 2 32 29
      #> 3 NA 15









      share|improve this question
















      thank you in advance for anyone who is going to try and help with this.



      I'm using the Yelp data set and the question I want to answer is "which categories are positively correlated with higher stars for X category (Bars for example)"



      The issue I'm encountering is that for each business the categories are lumped together into one column and row per businesss_id. So I need a means to separate out each category, turn them into columns and then check if the original category column contains the category that the column was created for.



      My current train of thought is to use group_by with business_id and then unnest_tokens the column, then model.matrix() that column into the split I want and then join it onto the df I'm using. But I can't get model.matrix to pass and keep business_id connected to each row.



      # an example of what I am using #
      df <-
      data_frame(business_id = c("bus_1",
      "bus_2",
      "bus_3"),
      categories=c("Pizza, Burgers, Caterers",
      "Pizza, Restaurants, Bars",
      "American, Barbeque, Restaurants"))

      # what I want it to look like #
      desired_df <-
      data_frame(business_id = c("bus_1",
      "bus_2",
      "bus_3"),
      categories=c("Pizza, Burgers, Caterers",
      "Pizza, Restaurants, Bars",
      "American, Barbeque, Restaurants"),
      Pizza = c(1, 1, 0),
      Burgers = c(1, 0, 0),
      Caterers = c(1, 0, 0),
      Restaurants = c(0, 1, 1),
      Bars = c(0, 1, 0),
      American = c(0, 0, 1),
      Barbeque = c(0, 0, 1))

      # where I am stuck #
      df %>%
      select(business_id, categories) %>%
      group_by(business_id) %>%
      unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
      model.matrix(business_id ~ categories, data = .) %>%
      as_data_frame


      Edit: After this post and the answers below I encountered a duplicate identifiers error using spread(). Which brought me to this thread https://github.com/tidyverse/tidyr/issues/426 where the answer to my question was posted, I've repasted it below.



      # duplicating the error with a smaller data.frame #



      library(tidyverse)
      df <- structure(list(age = c("21", "17", "32", "29", "15"),
      gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor")),
      row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("age", "gender"))
      df
      #> # A tibble: 5 x 2
      #> age gender
      #> <chr> <fct>
      #> 1 21 Male
      #> 2 17 Female
      #> 3 32 Female
      #> 4 29 Male
      #> 5 15 Male

      df %>%
      spread(key=gender, value=age)
      #> Error: Duplicate identifiers for rows (2, 3), (1, 4, 5)


      # fixing the problem #



      df %>% 
      group_by_at(vars(-age)) %>% # group by everything other than the value column.
      mutate(row_id=1:n()) %>% ungroup() %>% # build group index
      spread(key=gender, value=age) %>% # spread
      select(-row_id) # drop the index

      #> # A tibble: 3 x 2
      #> Female Male
      #> <chr> <chr>
      #> 1 17 21
      #> 2 32 29
      #> 3 NA 15






      r dataframe yelp






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 1 at 20:15







      phillip perin

















      asked Mar 27 at 23:54









      phillip perinphillip perin

      83 bronze badges




      83 bronze badges

























          2 Answers
          2






          active

          oldest

          votes


















          0















          Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution



          library(dplyr)
          library(tidyr)
          library(tidytext)

          df %>%
          select(business_id, categories) %>%
          group_by(business_id) %>%
          unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
          mutate(value = 1) %>%
          spread(categories, value, fill = 0)
          # business_id american barbeque bars burgers caterers pizza restaurants
          # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # bus_1 0 0 0 1 1 1 0
          # bus_2 0 0 1 0 0 1 1
          # bus_3 1 1 0 0 0 0 1





          share|improve this answer

























          • This is great, thank you so much!

            – phillip perin
            Mar 28 at 7:32











          • Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

            – phillip perin
            Apr 1 at 20:12











          • Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

            – Recle Vibal
            Apr 2 at 19:02


















          0















          Here is a simple tidyverse solution:



          library(tidyverse)

          df %>%
          mutate(
          ind = 1,
          tmp = strsplit(categories, ", ")
          ) %>%
          unnest(tmp) %>%
          spread(tmp, ind, fill = 0)
          ## A tibble: 3 x 9
          # business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
          # <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          #1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
          #2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
          #3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1





          share|improve this answer

























          • Thank you very much for this condensed answer!

            – phillip perin
            Mar 28 at 19:40













          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388205%2fr-yelp-data-business-category-column-has-multiple-categories-per-business-wan%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0















          Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution



          library(dplyr)
          library(tidyr)
          library(tidytext)

          df %>%
          select(business_id, categories) %>%
          group_by(business_id) %>%
          unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
          mutate(value = 1) %>%
          spread(categories, value, fill = 0)
          # business_id american barbeque bars burgers caterers pizza restaurants
          # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # bus_1 0 0 0 1 1 1 0
          # bus_2 0 0 1 0 0 1 1
          # bus_3 1 1 0 0 0 0 1





          share|improve this answer

























          • This is great, thank you so much!

            – phillip perin
            Mar 28 at 7:32











          • Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

            – phillip perin
            Apr 1 at 20:12











          • Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

            – Recle Vibal
            Apr 2 at 19:02















          0















          Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution



          library(dplyr)
          library(tidyr)
          library(tidytext)

          df %>%
          select(business_id, categories) %>%
          group_by(business_id) %>%
          unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
          mutate(value = 1) %>%
          spread(categories, value, fill = 0)
          # business_id american barbeque bars burgers caterers pizza restaurants
          # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # bus_1 0 0 0 1 1 1 0
          # bus_2 0 0 1 0 0 1 1
          # bus_3 1 1 0 0 0 0 1





          share|improve this answer

























          • This is great, thank you so much!

            – phillip perin
            Mar 28 at 7:32











          • Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

            – phillip perin
            Apr 1 at 20:12











          • Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

            – Recle Vibal
            Apr 2 at 19:02













          0














          0










          0









          Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution



          library(dplyr)
          library(tidyr)
          library(tidytext)

          df %>%
          select(business_id, categories) %>%
          group_by(business_id) %>%
          unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
          mutate(value = 1) %>%
          spread(categories, value, fill = 0)
          # business_id american barbeque bars burgers caterers pizza restaurants
          # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # bus_1 0 0 0 1 1 1 0
          # bus_2 0 0 1 0 0 1 1
          # bus_3 1 1 0 0 0 0 1





          share|improve this answer













          Building from your nice use of tidytext::unnest_tokens(), you can also use this alternative solution



          library(dplyr)
          library(tidyr)
          library(tidytext)

          df %>%
          select(business_id, categories) %>%
          group_by(business_id) %>%
          unnest_tokens(categories, categories, token = 'regex', pattern=", ") %>%
          mutate(value = 1) %>%
          spread(categories, value, fill = 0)
          # business_id american barbeque bars burgers caterers pizza restaurants
          # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          # bus_1 0 0 0 1 1 1 0
          # bus_2 0 0 1 0 0 1 1
          # bus_3 1 1 0 0 0 0 1






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 at 0:31









          Recle VibalRecle Vibal

          1013 bronze badges




          1013 bronze badges















          • This is great, thank you so much!

            – phillip perin
            Mar 28 at 7:32











          • Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

            – phillip perin
            Apr 1 at 20:12











          • Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

            – Recle Vibal
            Apr 2 at 19:02

















          • This is great, thank you so much!

            – phillip perin
            Mar 28 at 7:32











          • Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

            – phillip perin
            Apr 1 at 20:12











          • Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

            – Recle Vibal
            Apr 2 at 19:02
















          This is great, thank you so much!

          – phillip perin
          Mar 28 at 7:32





          This is great, thank you so much!

          – phillip perin
          Mar 28 at 7:32













          Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

          – phillip perin
          Apr 1 at 20:12





          Hey, I just wanted to update this. I was encountering a duplicate identifiers error. It lead me to this thread github.com/tidyverse/tidyr/issues/426 Where the solution to it has been posted.

          – phillip perin
          Apr 1 at 20:12













          Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

          – Recle Vibal
          Apr 2 at 19:02





          Ah yes, spread() can encounter difficulties when it cannot identify unique rows. Glad you found the solution. For an alternative, I highly suggest cdata for your data transformation needs. However, spread() fits much better to your current workflow.

          – Recle Vibal
          Apr 2 at 19:02













          0















          Here is a simple tidyverse solution:



          library(tidyverse)

          df %>%
          mutate(
          ind = 1,
          tmp = strsplit(categories, ", ")
          ) %>%
          unnest(tmp) %>%
          spread(tmp, ind, fill = 0)
          ## A tibble: 3 x 9
          # business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
          # <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          #1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
          #2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
          #3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1





          share|improve this answer

























          • Thank you very much for this condensed answer!

            – phillip perin
            Mar 28 at 19:40















          0















          Here is a simple tidyverse solution:



          library(tidyverse)

          df %>%
          mutate(
          ind = 1,
          tmp = strsplit(categories, ", ")
          ) %>%
          unnest(tmp) %>%
          spread(tmp, ind, fill = 0)
          ## A tibble: 3 x 9
          # business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
          # <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          #1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
          #2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
          #3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1





          share|improve this answer

























          • Thank you very much for this condensed answer!

            – phillip perin
            Mar 28 at 19:40













          0














          0










          0









          Here is a simple tidyverse solution:



          library(tidyverse)

          df %>%
          mutate(
          ind = 1,
          tmp = strsplit(categories, ", ")
          ) %>%
          unnest(tmp) %>%
          spread(tmp, ind, fill = 0)
          ## A tibble: 3 x 9
          # business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
          # <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          #1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
          #2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
          #3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1





          share|improve this answer













          Here is a simple tidyverse solution:



          library(tidyverse)

          df %>%
          mutate(
          ind = 1,
          tmp = strsplit(categories, ", ")
          ) %>%
          unnest(tmp) %>%
          spread(tmp, ind, fill = 0)
          ## A tibble: 3 x 9
          # business_id categories American Barbeque Bars Burgers Caterers Pizza Restaurants
          # <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
          #1 bus_1 Pizza, Burgers, Caterers 0 0 0 1 1 1 0
          #2 bus_2 Pizza, Restaurants, Bars 0 0 1 0 0 1 1
          #3 bus_3 American, Barbeque, Restaurants 1 1 0 0 0 0 1






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 28 at 0:19









          DiceboyTDiceboyT

          1,7651 silver badge14 bronze badges




          1,7651 silver badge14 bronze badges















          • Thank you very much for this condensed answer!

            – phillip perin
            Mar 28 at 19:40

















          • Thank you very much for this condensed answer!

            – phillip perin
            Mar 28 at 19:40
















          Thank you very much for this condensed answer!

          – phillip perin
          Mar 28 at 19:40





          Thank you very much for this condensed answer!

          – phillip perin
          Mar 28 at 19:40

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55388205%2fr-yelp-data-business-category-column-has-multiple-categories-per-business-wan%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript