Keep first row by multiple columns in an R data.tableDisplay single row for each group in a data.tableremove duplicated.values in data.tableHow to remove duplicates based on 2 columns in `data.table`How can I remove duplicate rows?How to sort a dataframe by multiple column(s)Quickly reading very large tables as dataframesFiltering out duplicated/non-unique rows in data.tableRemove duplicated rowsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?cbind specific columns from multiple data.tables efficientlyAdd columns to a data.table with joins efficientlyFinding duplicate columns in a data.table

Arrows inside a commutative diagram using tikzcd

Manager wants to hire me; HR does not. How to proceed?

Would a character with eternal youth be AL-compliant?

What do I need to do, tax-wise, for a sudden windfall?

Short story about psychologist analyzing demon

Am I being scammed by a sugar daddy?

A flower's head or heart?

Was the Lonely Mountain, where Smaug lived, a volcano?

Fastest way from 10 to 1 with everyone in between

Can an open source licence be revoked if it violates employer's IP?

Should I move out from my current apartment before the contract ends to save more money?

Why did the Death Eaters wait to reopen the Chamber of Secrets?

New Site Design!

Can Dive Down protect a creature against Pacifism?

I received a gift from my sister who just got back from

Why is gun control associated with the socially liberal Democratic party?

How can religions without a hell discourage evil-doing?

What game uses dice with compass point arrows, forbidden signs, explosions, arrows and targeting reticles?

Why does there seem to be an extreme lack of public trashcans in Taiwan?

Is it possible to install Firefox on Ubuntu with no desktop enviroment?

Purpose of cylindrical attachments on Power Transmission towers

I sent an angry e-mail to my interviewers about a conflict at my home institution. Could this affect my application?

Does "aurea" have the second meaning?

Is fission/fusion to iron the most efficient way to convert mass to energy?



Keep first row by multiple columns in an R data.table


Display single row for each group in a data.tableremove duplicated.values in data.tableHow to remove duplicates based on 2 columns in `data.table`How can I remove duplicate rows?How to sort a dataframe by multiple column(s)Quickly reading very large tables as dataframesFiltering out duplicated/non-unique rows in data.tableRemove duplicated rowsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?cbind specific columns from multiple data.tables efficientlyAdd columns to a data.table with joins efficientlyFinding duplicate columns in a data.table






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








4















I'd like to get the first row only from a data.table, grouped by multiple columns.



This is straightforward with a single column, e.g.:



(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2


But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:



dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2


Except for this, which only works in certain cases:



dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2









share|improve this question




























    4















    I'd like to get the first row only from a data.table, grouped by multiple columns.



    This is straightforward with a single column, e.g.:



    (dt <- data.table(x = c(1, 1, 1, 2),
    y = c(1, 1, 2, 2),
    z = c(1, 2, 1, 2)))
    # x y z
    # |1: 1 1 1
    # |2: 1 1 2
    # |3: 1 2 1
    # |4: 2 2 2
    dt[!duplicated(x)] # Remove rows 2-3
    # x y z
    # |1: 1 1 1
    # |2: 2 2 2


    But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:



    dt[!duplicated(x, y)] # Keeps only original data set
    # x y z
    # |1: 1 1 1
    # |2: 1 1 2
    # |3: 1 2 1
    # |4: 2 2 2
    dt[!duplicated(list(x, y))] # Same as above
    dt[!duplicated(c("x", "y"))] # Same as above
    dt[!duplicated(list("x", "y"))] # Same as above
    dt[!duplicated(c(x, y))] # Only removes duplicates from first column
    # x y z
    # |1: 1 1 1
    # |2: 2 2 2


    Except for this, which only works in certain cases:



    dt[!duplicated(paste0(x, y))]
    # x y z
    # |1: 1 1 1
    # |2: 1 2 1
    # |3: 2 2 2









    share|improve this question
























      4












      4








      4


      2






      I'd like to get the first row only from a data.table, grouped by multiple columns.



      This is straightforward with a single column, e.g.:



      (dt <- data.table(x = c(1, 1, 1, 2),
      y = c(1, 1, 2, 2),
      z = c(1, 2, 1, 2)))
      # x y z
      # |1: 1 1 1
      # |2: 1 1 2
      # |3: 1 2 1
      # |4: 2 2 2
      dt[!duplicated(x)] # Remove rows 2-3
      # x y z
      # |1: 1 1 1
      # |2: 2 2 2


      But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:



      dt[!duplicated(x, y)] # Keeps only original data set
      # x y z
      # |1: 1 1 1
      # |2: 1 1 2
      # |3: 1 2 1
      # |4: 2 2 2
      dt[!duplicated(list(x, y))] # Same as above
      dt[!duplicated(c("x", "y"))] # Same as above
      dt[!duplicated(list("x", "y"))] # Same as above
      dt[!duplicated(c(x, y))] # Only removes duplicates from first column
      # x y z
      # |1: 1 1 1
      # |2: 2 2 2


      Except for this, which only works in certain cases:



      dt[!duplicated(paste0(x, y))]
      # x y z
      # |1: 1 1 1
      # |2: 1 2 1
      # |3: 2 2 2









      share|improve this question














      I'd like to get the first row only from a data.table, grouped by multiple columns.



      This is straightforward with a single column, e.g.:



      (dt <- data.table(x = c(1, 1, 1, 2),
      y = c(1, 1, 2, 2),
      z = c(1, 2, 1, 2)))
      # x y z
      # |1: 1 1 1
      # |2: 1 1 2
      # |3: 1 2 1
      # |4: 2 2 2
      dt[!duplicated(x)] # Remove rows 2-3
      # x y z
      # |1: 1 1 1
      # |2: 2 2 2


      But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:



      dt[!duplicated(x, y)] # Keeps only original data set
      # x y z
      # |1: 1 1 1
      # |2: 1 1 2
      # |3: 1 2 1
      # |4: 2 2 2
      dt[!duplicated(list(x, y))] # Same as above
      dt[!duplicated(c("x", "y"))] # Same as above
      dt[!duplicated(list("x", "y"))] # Same as above
      dt[!duplicated(c(x, y))] # Only removes duplicates from first column
      # x y z
      # |1: 1 1 1
      # |2: 2 2 2


      Except for this, which only works in certain cases:



      dt[!duplicated(paste0(x, y))]
      # x y z
      # |1: 1 1 1
      # |2: 1 2 1
      # |3: 2 2 2






      r duplicates data.table






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jul 23 '14 at 5:44









      Max GhenisMax Ghenis

      5,28763664




      5,28763664






















          2 Answers
          2






          active

          oldest

          votes


















          12














          data.table provides S3 methods for unique, duplicated and anyDuplicated



          unique(dt, by = c('x','y'))


          will give you what you want.






          share|improve this answer






























            5














            data.table does duplicated by key. From ?duplicated.data.table:



             ‘duplicated’ returns a logical vector indicating which rows of a
            ‘data.table’ have duplicate rows (by key).





            setkey(dt, x, y)
            dt[!duplicated(dt)]
            ## x y z
            ## 1: 1 1 1
            ## 2: 1 2 1
            ## 3: 2 2 2





            share|improve this answer























            • by key by default, you can specify the by variables

              – mnel
              Jul 23 '14 at 5:56











            • @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

              – Jake Burkhead
              Jul 23 '14 at 5:56












            • dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

              – akrun
              Jul 23 '14 at 6:00











            • Thanks! Upvoted both but chose @mnel's for conciseness

              – Max Ghenis
              Jul 23 '14 at 6:03











            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24902737%2fkeep-first-row-by-multiple-columns-in-an-r-data-table%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            12














            data.table provides S3 methods for unique, duplicated and anyDuplicated



            unique(dt, by = c('x','y'))


            will give you what you want.






            share|improve this answer



























              12














              data.table provides S3 methods for unique, duplicated and anyDuplicated



              unique(dt, by = c('x','y'))


              will give you what you want.






              share|improve this answer

























                12












                12








                12







                data.table provides S3 methods for unique, duplicated and anyDuplicated



                unique(dt, by = c('x','y'))


                will give you what you want.






                share|improve this answer













                data.table provides S3 methods for unique, duplicated and anyDuplicated



                unique(dt, by = c('x','y'))


                will give you what you want.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jul 23 '14 at 5:53









                mnelmnel

                94k20222233




                94k20222233























                    5














                    data.table does duplicated by key. From ?duplicated.data.table:



                     ‘duplicated’ returns a logical vector indicating which rows of a
                    ‘data.table’ have duplicate rows (by key).





                    setkey(dt, x, y)
                    dt[!duplicated(dt)]
                    ## x y z
                    ## 1: 1 1 1
                    ## 2: 1 2 1
                    ## 3: 2 2 2





                    share|improve this answer























                    • by key by default, you can specify the by variables

                      – mnel
                      Jul 23 '14 at 5:56











                    • @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                      – Jake Burkhead
                      Jul 23 '14 at 5:56












                    • dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                      – akrun
                      Jul 23 '14 at 6:00











                    • Thanks! Upvoted both but chose @mnel's for conciseness

                      – Max Ghenis
                      Jul 23 '14 at 6:03















                    5














                    data.table does duplicated by key. From ?duplicated.data.table:



                     ‘duplicated’ returns a logical vector indicating which rows of a
                    ‘data.table’ have duplicate rows (by key).





                    setkey(dt, x, y)
                    dt[!duplicated(dt)]
                    ## x y z
                    ## 1: 1 1 1
                    ## 2: 1 2 1
                    ## 3: 2 2 2





                    share|improve this answer























                    • by key by default, you can specify the by variables

                      – mnel
                      Jul 23 '14 at 5:56











                    • @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                      – Jake Burkhead
                      Jul 23 '14 at 5:56












                    • dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                      – akrun
                      Jul 23 '14 at 6:00











                    • Thanks! Upvoted both but chose @mnel's for conciseness

                      – Max Ghenis
                      Jul 23 '14 at 6:03













                    5












                    5








                    5







                    data.table does duplicated by key. From ?duplicated.data.table:



                     ‘duplicated’ returns a logical vector indicating which rows of a
                    ‘data.table’ have duplicate rows (by key).





                    setkey(dt, x, y)
                    dt[!duplicated(dt)]
                    ## x y z
                    ## 1: 1 1 1
                    ## 2: 1 2 1
                    ## 3: 2 2 2





                    share|improve this answer













                    data.table does duplicated by key. From ?duplicated.data.table:



                     ‘duplicated’ returns a logical vector indicating which rows of a
                    ‘data.table’ have duplicate rows (by key).





                    setkey(dt, x, y)
                    dt[!duplicated(dt)]
                    ## x y z
                    ## 1: 1 1 1
                    ## 2: 1 2 1
                    ## 3: 2 2 2






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jul 23 '14 at 5:55









                    Jake BurkheadJake Burkhead

                    5,73721631




                    5,73721631












                    • by key by default, you can specify the by variables

                      – mnel
                      Jul 23 '14 at 5:56











                    • @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                      – Jake Burkhead
                      Jul 23 '14 at 5:56












                    • dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                      – akrun
                      Jul 23 '14 at 6:00











                    • Thanks! Upvoted both but chose @mnel's for conciseness

                      – Max Ghenis
                      Jul 23 '14 at 6:03

















                    • by key by default, you can specify the by variables

                      – mnel
                      Jul 23 '14 at 5:56











                    • @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                      – Jake Burkhead
                      Jul 23 '14 at 5:56












                    • dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                      – akrun
                      Jul 23 '14 at 6:00











                    • Thanks! Upvoted both but chose @mnel's for conciseness

                      – Max Ghenis
                      Jul 23 '14 at 6:03
















                    by key by default, you can specify the by variables

                    – mnel
                    Jul 23 '14 at 5:56





                    by key by default, you can specify the by variables

                    – mnel
                    Jul 23 '14 at 5:56













                    @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                    – Jake Burkhead
                    Jul 23 '14 at 5:56






                    @mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange

                    – Jake Burkhead
                    Jul 23 '14 at 5:56














                    dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                    – akrun
                    Jul 23 '14 at 6:00





                    dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work

                    – akrun
                    Jul 23 '14 at 6:00













                    Thanks! Upvoted both but chose @mnel's for conciseness

                    – Max Ghenis
                    Jul 23 '14 at 6:03





                    Thanks! Upvoted both but chose @mnel's for conciseness

                    – Max Ghenis
                    Jul 23 '14 at 6:03

















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24902737%2fkeep-first-row-by-multiple-columns-in-an-r-data-table%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                    은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현