Subset multiple different rows of a Data frameDrop factor levels in a subsetted data frameHow to join (merge) data frames (inner, outer, left, right)Drop data frame columns by nameWhy is `[` better than `subset`?How to combine dataframes with different columns without merge?Keep replacing duplicate rows until there are none left?Filtering Data Frame by using a Dynamic Data FrameData frame multiplication and storing in a other data frame using RDivide each values in rows of different data frames by values stored in vectorMerge multiple data frames with partially matching rows

Security Patch SUPEE-11155 - Possible issues?

How soon after takeoff can you recline your airplane seat?

What verb for taking advantage fits in "I don't want to ________ on the friendship"?

Installed software from source, how to say yum not to install it from package?

Can I take Amul cottage cheese from India to Netherlands?

The alcoholic village festival

How do I keep a running total of data in a column in excel?

What structure do natural isomorphisms preserve?

Excel prefixes or suffixes

Why was Pan Am Flight 103 flying over Lockerbie?

I agreed to cancel a long-planned vacation (with travel costs) due to project deadlines, but now the timeline has all changed again

English idiomatic equivalents of 能骗就骗 (if you can cheat, then cheat)

Having to constantly redo everything because I don't know how to do it

What are the children of two Muggle-borns called?

ESTA Elegible after Qatar?

Is leaving out prefixes like "rauf", "rüber", "rein" when describing movement considered a big mistake in spoken German?

Tricolour nonogram

Tikz, loop not appearing

Single method for different parameterized mysql command

Why wasn't EBCDIC designed with contiguous alphanumeric characters?

Why will we fail creating a self sustaining off world colony?

Can you twin the Light cantrip?

Why did the Apple //e make a hideous noise if you inserted the disk upside down?

Is it possible to pray to Hashem for a specific person as your prospective spouse?



Subset multiple different rows of a Data frame


Drop factor levels in a subsetted data frameHow to join (merge) data frames (inner, outer, left, right)Drop data frame columns by nameWhy is `[` better than `subset`?How to combine dataframes with different columns without merge?Keep replacing duplicate rows until there are none left?Filtering Data Frame by using a Dynamic Data FrameData frame multiplication and storing in a other data frame using RDivide each values in rows of different data frames by values stored in vectorMerge multiple data frames with partially matching rows













1















Hi How can I subset 2 different N random samples in a data frame. See example below.



I have df the main dataset. I need 2 subsets of the main dataset. I got 2 subsets by getting 3 random rows from the main dataset. However I need those 2 subsets to be unique with each other.



> df = data.frame(matrix(rnorm(20), nrow=10))
> df
X1 X2
1 0.19234071 -0.86702704
2 -0.18264853 1.75276062
3 0.75824257 -0.51314220
4 -0.84571563 -1.24841675
5 0.75470152 1.51408945
6 1.04546517 1.33292716
7 -0.51449011 -1.51275633
8 1.36014747 0.07400024
9 -0.02397481 0.17177997
10 -1.37967248 -0.50416489

df1 = df[sample(nrow(df), 3), ]
df1
X1 X2
10 -1.3796725 -0.5041649
1 0.1923407 -0.8670270
4 -0.8457156 -1.2484167

df2 = df[sample(nrow(df), 3), ]
df2
X1 X2
3 0.7582426 -0.5131422
4 -0.8457156 -1.2484167
6 1.0454652 1.3329272


As you can see the random subsets df1 and df2 have same row which is the row 4. I need 2 random subsets of the dataframe that had different rows.










share|improve this question






















  • split(head(df[sample(nrow(df)),]), 1:2)?

    – Frank
    Mar 25 at 16:07















1















Hi How can I subset 2 different N random samples in a data frame. See example below.



I have df the main dataset. I need 2 subsets of the main dataset. I got 2 subsets by getting 3 random rows from the main dataset. However I need those 2 subsets to be unique with each other.



> df = data.frame(matrix(rnorm(20), nrow=10))
> df
X1 X2
1 0.19234071 -0.86702704
2 -0.18264853 1.75276062
3 0.75824257 -0.51314220
4 -0.84571563 -1.24841675
5 0.75470152 1.51408945
6 1.04546517 1.33292716
7 -0.51449011 -1.51275633
8 1.36014747 0.07400024
9 -0.02397481 0.17177997
10 -1.37967248 -0.50416489

df1 = df[sample(nrow(df), 3), ]
df1
X1 X2
10 -1.3796725 -0.5041649
1 0.1923407 -0.8670270
4 -0.8457156 -1.2484167

df2 = df[sample(nrow(df), 3), ]
df2
X1 X2
3 0.7582426 -0.5131422
4 -0.8457156 -1.2484167
6 1.0454652 1.3329272


As you can see the random subsets df1 and df2 have same row which is the row 4. I need 2 random subsets of the dataframe that had different rows.










share|improve this question






















  • split(head(df[sample(nrow(df)),]), 1:2)?

    – Frank
    Mar 25 at 16:07













1












1








1








Hi How can I subset 2 different N random samples in a data frame. See example below.



I have df the main dataset. I need 2 subsets of the main dataset. I got 2 subsets by getting 3 random rows from the main dataset. However I need those 2 subsets to be unique with each other.



> df = data.frame(matrix(rnorm(20), nrow=10))
> df
X1 X2
1 0.19234071 -0.86702704
2 -0.18264853 1.75276062
3 0.75824257 -0.51314220
4 -0.84571563 -1.24841675
5 0.75470152 1.51408945
6 1.04546517 1.33292716
7 -0.51449011 -1.51275633
8 1.36014747 0.07400024
9 -0.02397481 0.17177997
10 -1.37967248 -0.50416489

df1 = df[sample(nrow(df), 3), ]
df1
X1 X2
10 -1.3796725 -0.5041649
1 0.1923407 -0.8670270
4 -0.8457156 -1.2484167

df2 = df[sample(nrow(df), 3), ]
df2
X1 X2
3 0.7582426 -0.5131422
4 -0.8457156 -1.2484167
6 1.0454652 1.3329272


As you can see the random subsets df1 and df2 have same row which is the row 4. I need 2 random subsets of the dataframe that had different rows.










share|improve this question














Hi How can I subset 2 different N random samples in a data frame. See example below.



I have df the main dataset. I need 2 subsets of the main dataset. I got 2 subsets by getting 3 random rows from the main dataset. However I need those 2 subsets to be unique with each other.



> df = data.frame(matrix(rnorm(20), nrow=10))
> df
X1 X2
1 0.19234071 -0.86702704
2 -0.18264853 1.75276062
3 0.75824257 -0.51314220
4 -0.84571563 -1.24841675
5 0.75470152 1.51408945
6 1.04546517 1.33292716
7 -0.51449011 -1.51275633
8 1.36014747 0.07400024
9 -0.02397481 0.17177997
10 -1.37967248 -0.50416489

df1 = df[sample(nrow(df), 3), ]
df1
X1 X2
10 -1.3796725 -0.5041649
1 0.1923407 -0.8670270
4 -0.8457156 -1.2484167

df2 = df[sample(nrow(df), 3), ]
df2
X1 X2
3 0.7582426 -0.5131422
4 -0.8457156 -1.2484167
6 1.0454652 1.3329272


As you can see the random subsets df1 and df2 have same row which is the row 4. I need 2 random subsets of the dataframe that had different rows.







r






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 25 at 15:35









Mr. BusterMr. Buster

644 bronze badges




644 bronze badges












  • split(head(df[sample(nrow(df)),]), 1:2)?

    – Frank
    Mar 25 at 16:07

















  • split(head(df[sample(nrow(df)),]), 1:2)?

    – Frank
    Mar 25 at 16:07
















split(head(df[sample(nrow(df)),]), 1:2)?

– Frank
Mar 25 at 16:07





split(head(df[sample(nrow(df)),]), 1:2)?

– Frank
Mar 25 at 16:07










3 Answers
3






active

oldest

votes


















2














If you want to split the data into 2 distinct sets, you can create an index and split the frames, something like this



set.seed(42)
idx <- sample(1:nrow(df), 3)
df1 <- df[idx, ]
df2 <- df[-idx, ]
df1
X1 X2
10 1.359814 0.6919378
9 1.248144 0.9783253
3 1.903994 0.4371896
df2
X1 X2
1 -0.3743900 0.54040310
2 -0.3204993 0.02383999
4 -0.2552918 0.94148533
5 -0.7327228 -1.25263998
6 -1.0648850 0.06567222
7 -0.2147909 -0.19137447
8 1.2148835 1.36361765


For much more complex splits, do see caret::createDataPartition






share|improve this answer






























    0














    We can create a function if we nee to reuse the same logic



    f1 <- function(data, n) 
    data[sample(nrow(data), n),]




    Or if we need to create train/test dataset, we can use split



    lst1 <- split(df, seq_len(nrow(df)) %in% sample(nrow(df), 3))





    share|improve this answer
































      0














      You could also do something like this-



      idx <- sample(seq(1, 2), size = nrow(df), replace = TRUE, prob = c(.8, .2))
      set1 <- df[idx == 1,]
      set2 <- df[idx == 2,]


      Output-



      > set1
      X1 X2
      1 -0.85768451 -0.1545485
      2 -0.76420259 1.2054883
      3 -0.91973457 1.4867429
      6 -1.07558176 0.2527374
      7 0.03189408 1.4057502
      8 0.64270649 1.3742131
      9 1.59246097 -0.3845688
      10 -0.14158552 -1.5792062

      > set2
      X1 X2
      4 -0.6317524 0.06571271
      5 0.5005460 0.46277511


      Note**- You can change split percent in sample function. I have used 80-20%.






      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55341371%2fsubset-multiple-different-rows-of-a-data-frame%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        2














        If you want to split the data into 2 distinct sets, you can create an index and split the frames, something like this



        set.seed(42)
        idx <- sample(1:nrow(df), 3)
        df1 <- df[idx, ]
        df2 <- df[-idx, ]
        df1
        X1 X2
        10 1.359814 0.6919378
        9 1.248144 0.9783253
        3 1.903994 0.4371896
        df2
        X1 X2
        1 -0.3743900 0.54040310
        2 -0.3204993 0.02383999
        4 -0.2552918 0.94148533
        5 -0.7327228 -1.25263998
        6 -1.0648850 0.06567222
        7 -0.2147909 -0.19137447
        8 1.2148835 1.36361765


        For much more complex splits, do see caret::createDataPartition






        share|improve this answer



























          2














          If you want to split the data into 2 distinct sets, you can create an index and split the frames, something like this



          set.seed(42)
          idx <- sample(1:nrow(df), 3)
          df1 <- df[idx, ]
          df2 <- df[-idx, ]
          df1
          X1 X2
          10 1.359814 0.6919378
          9 1.248144 0.9783253
          3 1.903994 0.4371896
          df2
          X1 X2
          1 -0.3743900 0.54040310
          2 -0.3204993 0.02383999
          4 -0.2552918 0.94148533
          5 -0.7327228 -1.25263998
          6 -1.0648850 0.06567222
          7 -0.2147909 -0.19137447
          8 1.2148835 1.36361765


          For much more complex splits, do see caret::createDataPartition






          share|improve this answer

























            2












            2








            2







            If you want to split the data into 2 distinct sets, you can create an index and split the frames, something like this



            set.seed(42)
            idx <- sample(1:nrow(df), 3)
            df1 <- df[idx, ]
            df2 <- df[-idx, ]
            df1
            X1 X2
            10 1.359814 0.6919378
            9 1.248144 0.9783253
            3 1.903994 0.4371896
            df2
            X1 X2
            1 -0.3743900 0.54040310
            2 -0.3204993 0.02383999
            4 -0.2552918 0.94148533
            5 -0.7327228 -1.25263998
            6 -1.0648850 0.06567222
            7 -0.2147909 -0.19137447
            8 1.2148835 1.36361765


            For much more complex splits, do see caret::createDataPartition






            share|improve this answer













            If you want to split the data into 2 distinct sets, you can create an index and split the frames, something like this



            set.seed(42)
            idx <- sample(1:nrow(df), 3)
            df1 <- df[idx, ]
            df2 <- df[-idx, ]
            df1
            X1 X2
            10 1.359814 0.6919378
            9 1.248144 0.9783253
            3 1.903994 0.4371896
            df2
            X1 X2
            1 -0.3743900 0.54040310
            2 -0.3204993 0.02383999
            4 -0.2552918 0.94148533
            5 -0.7327228 -1.25263998
            6 -1.0648850 0.06567222
            7 -0.2147909 -0.19137447
            8 1.2148835 1.36361765


            For much more complex splits, do see caret::createDataPartition







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 25 at 15:39









            SonnySonny

            2,6651 gold badge5 silver badges17 bronze badges




            2,6651 gold badge5 silver badges17 bronze badges





















                0














                We can create a function if we nee to reuse the same logic



                f1 <- function(data, n) 
                data[sample(nrow(data), n),]




                Or if we need to create train/test dataset, we can use split



                lst1 <- split(df, seq_len(nrow(df)) %in% sample(nrow(df), 3))





                share|improve this answer





























                  0














                  We can create a function if we nee to reuse the same logic



                  f1 <- function(data, n) 
                  data[sample(nrow(data), n),]




                  Or if we need to create train/test dataset, we can use split



                  lst1 <- split(df, seq_len(nrow(df)) %in% sample(nrow(df), 3))





                  share|improve this answer



























                    0












                    0








                    0







                    We can create a function if we nee to reuse the same logic



                    f1 <- function(data, n) 
                    data[sample(nrow(data), n),]




                    Or if we need to create train/test dataset, we can use split



                    lst1 <- split(df, seq_len(nrow(df)) %in% sample(nrow(df), 3))





                    share|improve this answer















                    We can create a function if we nee to reuse the same logic



                    f1 <- function(data, n) 
                    data[sample(nrow(data), n),]




                    Or if we need to create train/test dataset, we can use split



                    lst1 <- split(df, seq_len(nrow(df)) %in% sample(nrow(df), 3))






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Mar 25 at 15:56

























                    answered Mar 25 at 15:37









                    akrunakrun

                    445k15 gold badges247 silver badges329 bronze badges




                    445k15 gold badges247 silver badges329 bronze badges





















                        0














                        You could also do something like this-



                        idx <- sample(seq(1, 2), size = nrow(df), replace = TRUE, prob = c(.8, .2))
                        set1 <- df[idx == 1,]
                        set2 <- df[idx == 2,]


                        Output-



                        > set1
                        X1 X2
                        1 -0.85768451 -0.1545485
                        2 -0.76420259 1.2054883
                        3 -0.91973457 1.4867429
                        6 -1.07558176 0.2527374
                        7 0.03189408 1.4057502
                        8 0.64270649 1.3742131
                        9 1.59246097 -0.3845688
                        10 -0.14158552 -1.5792062

                        > set2
                        X1 X2
                        4 -0.6317524 0.06571271
                        5 0.5005460 0.46277511


                        Note**- You can change split percent in sample function. I have used 80-20%.






                        share|improve this answer



























                          0














                          You could also do something like this-



                          idx <- sample(seq(1, 2), size = nrow(df), replace = TRUE, prob = c(.8, .2))
                          set1 <- df[idx == 1,]
                          set2 <- df[idx == 2,]


                          Output-



                          > set1
                          X1 X2
                          1 -0.85768451 -0.1545485
                          2 -0.76420259 1.2054883
                          3 -0.91973457 1.4867429
                          6 -1.07558176 0.2527374
                          7 0.03189408 1.4057502
                          8 0.64270649 1.3742131
                          9 1.59246097 -0.3845688
                          10 -0.14158552 -1.5792062

                          > set2
                          X1 X2
                          4 -0.6317524 0.06571271
                          5 0.5005460 0.46277511


                          Note**- You can change split percent in sample function. I have used 80-20%.






                          share|improve this answer

























                            0












                            0








                            0







                            You could also do something like this-



                            idx <- sample(seq(1, 2), size = nrow(df), replace = TRUE, prob = c(.8, .2))
                            set1 <- df[idx == 1,]
                            set2 <- df[idx == 2,]


                            Output-



                            > set1
                            X1 X2
                            1 -0.85768451 -0.1545485
                            2 -0.76420259 1.2054883
                            3 -0.91973457 1.4867429
                            6 -1.07558176 0.2527374
                            7 0.03189408 1.4057502
                            8 0.64270649 1.3742131
                            9 1.59246097 -0.3845688
                            10 -0.14158552 -1.5792062

                            > set2
                            X1 X2
                            4 -0.6317524 0.06571271
                            5 0.5005460 0.46277511


                            Note**- You can change split percent in sample function. I have used 80-20%.






                            share|improve this answer













                            You could also do something like this-



                            idx <- sample(seq(1, 2), size = nrow(df), replace = TRUE, prob = c(.8, .2))
                            set1 <- df[idx == 1,]
                            set2 <- df[idx == 2,]


                            Output-



                            > set1
                            X1 X2
                            1 -0.85768451 -0.1545485
                            2 -0.76420259 1.2054883
                            3 -0.91973457 1.4867429
                            6 -1.07558176 0.2527374
                            7 0.03189408 1.4057502
                            8 0.64270649 1.3742131
                            9 1.59246097 -0.3845688
                            10 -0.14158552 -1.5792062

                            > set2
                            X1 X2
                            4 -0.6317524 0.06571271
                            5 0.5005460 0.46277511


                            Note**- You can change split percent in sample function. I have used 80-20%.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Mar 25 at 18:57









                            RushabhRushabh

                            1,4914 silver badges22 bronze badges




                            1,4914 silver badges22 bronze badges



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55341371%2fsubset-multiple-different-rows-of-a-data-frame%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                                용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

                                155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해