Stack of irregular wide-form datasets in one file; make to clean long-form data in a single pipeHow to reshape data from long to wide formatReshape Panel Data Wide Format to Long FormatLong to wide data with tidyR?How to make data in a single column (long) with multiple, nested group categories wideUndo reshape with arbitrary number of columns createdHow to cast data from long to wide format in H2O?R How to reshape data and aggregate certain columns it at the same timer wide data to widish and longTransforming long dataset to wide with a twist in R?Split, reshape, bind stacked wide data using tidyverse in single pipe

Intersection Puzzle

iPad being using in wall mount battery swollen

Im going to France and my passport expires June 19th

Do UK voters know if their MP will be the Speaker of the House?

CAST throwing error when run in stored procedure but not when run as raw query

Avoiding the "not like other girls" trope?

Are there any examples of a variable being normally distributed that is *not* due to the Central Limit Theorem?

Ambiguity in the definition of entropy

Arrow those variables!

How do conventional missiles fly?

How do I handle a potential work/personal life conflict as the manager of one of my friends?

How did the Super Star Destroyer Executor get destroyed exactly?

Why is this clock signal connected to a capacitor to gnd?

What is a romance in Latin?

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

Bullying boss launched a smear campaign and made me unemployable

Alternative to sending password over mail?

Reverse dictionary where values are lists

Could the museum Saturn V's be refitted for one more flight?

What exploit Are these user agents trying to use?

Assassin's bullet with mercury

Can I run a new neutral wire to repair a broken circuit?

Is it inappropriate for a student to attend their mentor's dissertation defense?

Mathematica command that allows it to read my intentions



Stack of irregular wide-form datasets in one file; make to clean long-form data in a single pipe


How to reshape data from long to wide formatReshape Panel Data Wide Format to Long FormatLong to wide data with tidyR?How to make data in a single column (long) with multiple, nested group categories wideUndo reshape with arbitrary number of columns createdHow to cast data from long to wide format in H2O?R How to reshape data and aggregate certain columns it at the same timer wide data to widish and longTransforming long dataset to wide with a twist in R?Split, reshape, bind stacked wide data using tidyverse in single pipe













1















I have a stack of wide(ish) data frames provided in a single spreadsheet. The data are vote counts for various political parties in villages which are nested within districts. This is how they look:



df_in <- data.frame(
X1 = c(rep("District1", 4), rep("District2", 3)),
X2 = c("Party", "PartyA", "PartyB", "PartyC", "Party", "PartyA", "PartyB"),
X3 = c("Village1", "44", "12", "3", "Village3", "7", "88"),
X4 = c("Village2", "34", "19", "2", "Village4", "90", "65"),
X5 = c("", "", "", "", "Village5", "45", "62")
)


I want to make a long-form village/party vote count dataset that looks like this:



df_out <- data.frame(
district = c(rep("District1", 6), rep("District2", 6)),
village = c(rep("Village1", 3), rep("Village2", 3), rep("Village3", 2), rep("Village4", 2), rep("Village5", 2)),
party = c(rep(c("PartyA", "PartyB", "PartyC"), 2), rep(c("PartyA", "PartyB"), 3)),
votes = c(44, 12, 3, 34, 19, 2, 7, 88, 90, 65, 45, 62)
)


I'm looking for a way to get from df_in to df_out in a single pipe (since I have a lot of spreadsheets that look similar to this one).










share|improve this question




























    1















    I have a stack of wide(ish) data frames provided in a single spreadsheet. The data are vote counts for various political parties in villages which are nested within districts. This is how they look:



    df_in <- data.frame(
    X1 = c(rep("District1", 4), rep("District2", 3)),
    X2 = c("Party", "PartyA", "PartyB", "PartyC", "Party", "PartyA", "PartyB"),
    X3 = c("Village1", "44", "12", "3", "Village3", "7", "88"),
    X4 = c("Village2", "34", "19", "2", "Village4", "90", "65"),
    X5 = c("", "", "", "", "Village5", "45", "62")
    )


    I want to make a long-form village/party vote count dataset that looks like this:



    df_out <- data.frame(
    district = c(rep("District1", 6), rep("District2", 6)),
    village = c(rep("Village1", 3), rep("Village2", 3), rep("Village3", 2), rep("Village4", 2), rep("Village5", 2)),
    party = c(rep(c("PartyA", "PartyB", "PartyC"), 2), rep(c("PartyA", "PartyB"), 3)),
    votes = c(44, 12, 3, 34, 19, 2, 7, 88, 90, 65, 45, 62)
    )


    I'm looking for a way to get from df_in to df_out in a single pipe (since I have a lot of spreadsheets that look similar to this one).










    share|improve this question


























      1












      1








      1








      I have a stack of wide(ish) data frames provided in a single spreadsheet. The data are vote counts for various political parties in villages which are nested within districts. This is how they look:



      df_in <- data.frame(
      X1 = c(rep("District1", 4), rep("District2", 3)),
      X2 = c("Party", "PartyA", "PartyB", "PartyC", "Party", "PartyA", "PartyB"),
      X3 = c("Village1", "44", "12", "3", "Village3", "7", "88"),
      X4 = c("Village2", "34", "19", "2", "Village4", "90", "65"),
      X5 = c("", "", "", "", "Village5", "45", "62")
      )


      I want to make a long-form village/party vote count dataset that looks like this:



      df_out <- data.frame(
      district = c(rep("District1", 6), rep("District2", 6)),
      village = c(rep("Village1", 3), rep("Village2", 3), rep("Village3", 2), rep("Village4", 2), rep("Village5", 2)),
      party = c(rep(c("PartyA", "PartyB", "PartyC"), 2), rep(c("PartyA", "PartyB"), 3)),
      votes = c(44, 12, 3, 34, 19, 2, 7, 88, 90, 65, 45, 62)
      )


      I'm looking for a way to get from df_in to df_out in a single pipe (since I have a lot of spreadsheets that look similar to this one).










      share|improve this question
















      I have a stack of wide(ish) data frames provided in a single spreadsheet. The data are vote counts for various political parties in villages which are nested within districts. This is how they look:



      df_in <- data.frame(
      X1 = c(rep("District1", 4), rep("District2", 3)),
      X2 = c("Party", "PartyA", "PartyB", "PartyC", "Party", "PartyA", "PartyB"),
      X3 = c("Village1", "44", "12", "3", "Village3", "7", "88"),
      X4 = c("Village2", "34", "19", "2", "Village4", "90", "65"),
      X5 = c("", "", "", "", "Village5", "45", "62")
      )


      I want to make a long-form village/party vote count dataset that looks like this:



      df_out <- data.frame(
      district = c(rep("District1", 6), rep("District2", 6)),
      village = c(rep("Village1", 3), rep("Village2", 3), rep("Village3", 2), rep("Village4", 2), rep("Village5", 2)),
      party = c(rep(c("PartyA", "PartyB", "PartyC"), 2), rep(c("PartyA", "PartyB"), 3)),
      votes = c(44, 12, 3, 34, 19, 2, 7, 88, 90, 65, 45, 62)
      )


      I'm looking for a way to get from df_in to df_out in a single pipe (since I have a lot of spreadsheets that look similar to this one).







      r dplyr tidyr purrr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 22 at 16:24







      lethalSinger

















      asked Mar 21 at 21:09









      lethalSingerlethalSinger

      714




      714






















          1 Answer
          1






          active

          oldest

          votes


















          0














          A solution using the tidyverse.



          library(tidyverse)

          dat <- df_in %>%
          # Split the data frame
          split(f = .$X1) %>%
          # Remove columns with all ""
          map(~select_if(.x, function(x) !all(x == ""))) %>%
          # Use the first row as the column name
          map(~set_names(.x, nm = .x %>% slice(1) %>% unlist)) %>%
          # Rename the District column
          map(~set_names(.x, nm = c("District", names(.x)[2:ncol(.x)]))) %>%
          # Remove the first row
          map(~slice(.x, 2:n())) %>%
          # Gather the data frames
          map(~gather(.x, village, votes, starts_with("Village"))) %>%
          # Combine all results
          bind_rows()
          dat
          # District Party village votes
          # 1 District1 PartyA Village1 44
          # 2 District1 PartyB Village1 12
          # 3 District1 PartyC Village1 3
          # 4 District1 PartyA Village2 34
          # 5 District1 PartyB Village2 19
          # 6 District1 PartyC Village2 2
          # 7 District2 PartyA Village3 7
          # 8 District2 PartyB Village3 88
          # 9 District2 PartyA Village4 90
          # 10 District2 PartyB Village4 65
          # 11 District2 PartyA Village5 45
          # 12 District2 PartyB Village5 62





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289293%2fstack-of-irregular-wide-form-datasets-in-one-file-make-to-clean-long-form-data%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            A solution using the tidyverse.



            library(tidyverse)

            dat <- df_in %>%
            # Split the data frame
            split(f = .$X1) %>%
            # Remove columns with all ""
            map(~select_if(.x, function(x) !all(x == ""))) %>%
            # Use the first row as the column name
            map(~set_names(.x, nm = .x %>% slice(1) %>% unlist)) %>%
            # Rename the District column
            map(~set_names(.x, nm = c("District", names(.x)[2:ncol(.x)]))) %>%
            # Remove the first row
            map(~slice(.x, 2:n())) %>%
            # Gather the data frames
            map(~gather(.x, village, votes, starts_with("Village"))) %>%
            # Combine all results
            bind_rows()
            dat
            # District Party village votes
            # 1 District1 PartyA Village1 44
            # 2 District1 PartyB Village1 12
            # 3 District1 PartyC Village1 3
            # 4 District1 PartyA Village2 34
            # 5 District1 PartyB Village2 19
            # 6 District1 PartyC Village2 2
            # 7 District2 PartyA Village3 7
            # 8 District2 PartyB Village3 88
            # 9 District2 PartyA Village4 90
            # 10 District2 PartyB Village4 65
            # 11 District2 PartyA Village5 45
            # 12 District2 PartyB Village5 62





            share|improve this answer



























              0














              A solution using the tidyverse.



              library(tidyverse)

              dat <- df_in %>%
              # Split the data frame
              split(f = .$X1) %>%
              # Remove columns with all ""
              map(~select_if(.x, function(x) !all(x == ""))) %>%
              # Use the first row as the column name
              map(~set_names(.x, nm = .x %>% slice(1) %>% unlist)) %>%
              # Rename the District column
              map(~set_names(.x, nm = c("District", names(.x)[2:ncol(.x)]))) %>%
              # Remove the first row
              map(~slice(.x, 2:n())) %>%
              # Gather the data frames
              map(~gather(.x, village, votes, starts_with("Village"))) %>%
              # Combine all results
              bind_rows()
              dat
              # District Party village votes
              # 1 District1 PartyA Village1 44
              # 2 District1 PartyB Village1 12
              # 3 District1 PartyC Village1 3
              # 4 District1 PartyA Village2 34
              # 5 District1 PartyB Village2 19
              # 6 District1 PartyC Village2 2
              # 7 District2 PartyA Village3 7
              # 8 District2 PartyB Village3 88
              # 9 District2 PartyA Village4 90
              # 10 District2 PartyB Village4 65
              # 11 District2 PartyA Village5 45
              # 12 District2 PartyB Village5 62





              share|improve this answer

























                0












                0








                0







                A solution using the tidyverse.



                library(tidyverse)

                dat <- df_in %>%
                # Split the data frame
                split(f = .$X1) %>%
                # Remove columns with all ""
                map(~select_if(.x, function(x) !all(x == ""))) %>%
                # Use the first row as the column name
                map(~set_names(.x, nm = .x %>% slice(1) %>% unlist)) %>%
                # Rename the District column
                map(~set_names(.x, nm = c("District", names(.x)[2:ncol(.x)]))) %>%
                # Remove the first row
                map(~slice(.x, 2:n())) %>%
                # Gather the data frames
                map(~gather(.x, village, votes, starts_with("Village"))) %>%
                # Combine all results
                bind_rows()
                dat
                # District Party village votes
                # 1 District1 PartyA Village1 44
                # 2 District1 PartyB Village1 12
                # 3 District1 PartyC Village1 3
                # 4 District1 PartyA Village2 34
                # 5 District1 PartyB Village2 19
                # 6 District1 PartyC Village2 2
                # 7 District2 PartyA Village3 7
                # 8 District2 PartyB Village3 88
                # 9 District2 PartyA Village4 90
                # 10 District2 PartyB Village4 65
                # 11 District2 PartyA Village5 45
                # 12 District2 PartyB Village5 62





                share|improve this answer













                A solution using the tidyverse.



                library(tidyverse)

                dat <- df_in %>%
                # Split the data frame
                split(f = .$X1) %>%
                # Remove columns with all ""
                map(~select_if(.x, function(x) !all(x == ""))) %>%
                # Use the first row as the column name
                map(~set_names(.x, nm = .x %>% slice(1) %>% unlist)) %>%
                # Rename the District column
                map(~set_names(.x, nm = c("District", names(.x)[2:ncol(.x)]))) %>%
                # Remove the first row
                map(~slice(.x, 2:n())) %>%
                # Gather the data frames
                map(~gather(.x, village, votes, starts_with("Village"))) %>%
                # Combine all results
                bind_rows()
                dat
                # District Party village votes
                # 1 District1 PartyA Village1 44
                # 2 District1 PartyB Village1 12
                # 3 District1 PartyC Village1 3
                # 4 District1 PartyA Village2 34
                # 5 District1 PartyB Village2 19
                # 6 District1 PartyC Village2 2
                # 7 District2 PartyA Village3 7
                # 8 District2 PartyB Village3 88
                # 9 District2 PartyA Village4 90
                # 10 District2 PartyB Village4 65
                # 11 District2 PartyA Village5 45
                # 12 District2 PartyB Village5 62






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 24 at 1:40









                wwwwww

                28.6k112345




                28.6k112345





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55289293%2fstack-of-irregular-wide-form-datasets-in-one-file-make-to-clean-long-form-data%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript