Keep first row by multiple columns in an R data.tableDisplay single row for each group in a data.tableremove duplicated.values in data.tableHow to remove duplicates based on 2 columns in `data.table`How can I remove duplicate rows?How to sort a dataframe by multiple column(s)Quickly reading very large tables as dataframesFiltering out duplicated/non-unique rows in data.tableRemove duplicated rowsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?cbind specific columns from multiple data.tables efficientlyAdd columns to a data.table with joins efficientlyFinding duplicate columns in a data.table
Arrows inside a commutative diagram using tikzcd
Manager wants to hire me; HR does not. How to proceed?
Would a character with eternal youth be AL-compliant?
What do I need to do, tax-wise, for a sudden windfall?
Short story about psychologist analyzing demon
Am I being scammed by a sugar daddy?
A flower's head or heart?
Was the Lonely Mountain, where Smaug lived, a volcano?
Fastest way from 10 to 1 with everyone in between
Can an open source licence be revoked if it violates employer's IP?
Should I move out from my current apartment before the contract ends to save more money?
Why did the Death Eaters wait to reopen the Chamber of Secrets?
New Site Design!
Can Dive Down protect a creature against Pacifism?
I received a gift from my sister who just got back from
Why is gun control associated with the socially liberal Democratic party?
How can religions without a hell discourage evil-doing?
What game uses dice with compass point arrows, forbidden signs, explosions, arrows and targeting reticles?
Why does there seem to be an extreme lack of public trashcans in Taiwan?
Is it possible to install Firefox on Ubuntu with no desktop enviroment?
Purpose of cylindrical attachments on Power Transmission towers
I sent an angry e-mail to my interviewers about a conflict at my home institution. Could this affect my application?
Does "aurea" have the second meaning?
Is fission/fusion to iron the most efficient way to convert mass to energy?
Keep first row by multiple columns in an R data.table
Display single row for each group in a data.tableremove duplicated.values in data.tableHow to remove duplicates based on 2 columns in `data.table`How can I remove duplicate rows?How to sort a dataframe by multiple column(s)Quickly reading very large tables as dataframesFiltering out duplicated/non-unique rows in data.tableRemove duplicated rowsWhat is the purpose of setting a key in data.table?data.table vs dplyr: can one do something well the other can't or does poorly?cbind specific columns from multiple data.tables efficientlyAdd columns to a data.table with joins efficientlyFinding duplicate columns in a data.table
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'd like to get the first row only from a data.table, grouped by multiple columns.
This is straightforward with a single column, e.g.:
(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2
But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:
dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2
Except for this, which only works in certain cases:
dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
r duplicates data.table
add a comment |
I'd like to get the first row only from a data.table, grouped by multiple columns.
This is straightforward with a single column, e.g.:
(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2
But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:
dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2
Except for this, which only works in certain cases:
dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
r duplicates data.table
add a comment |
I'd like to get the first row only from a data.table, grouped by multiple columns.
This is straightforward with a single column, e.g.:
(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2
But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:
dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2
Except for this, which only works in certain cases:
dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
r duplicates data.table
I'd like to get the first row only from a data.table, grouped by multiple columns.
This is straightforward with a single column, e.g.:
(dt <- data.table(x = c(1, 1, 1, 2),
y = c(1, 1, 2, 2),
z = c(1, 2, 1, 2)))
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
# x y z
# |1: 1 1 1
# |2: 2 2 2
But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:
dt[!duplicated(x, y)] # Keeps only original data set
# x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
# x y z
# |1: 1 1 1
# |2: 2 2 2
Except for this, which only works in certain cases:
dt[!duplicated(paste0(x, y))]
# x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2
r duplicates data.table
r duplicates data.table
asked Jul 23 '14 at 5:44
Max GhenisMax Ghenis
5,28763664
5,28763664
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
data.table
provides S3 methods for unique
, duplicated
and anyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
add a comment |
data.table
does duplicated
by key. From ?duplicated.data.table
:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
bykey
by default, you can specify the by variables
– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24902737%2fkeep-first-row-by-multiple-columns-in-an-r-data-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
data.table
provides S3 methods for unique
, duplicated
and anyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
add a comment |
data.table
provides S3 methods for unique
, duplicated
and anyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
add a comment |
data.table
provides S3 methods for unique
, duplicated
and anyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
data.table
provides S3 methods for unique
, duplicated
and anyDuplicated
unique(dt, by = c('x','y'))
will give you what you want.
answered Jul 23 '14 at 5:53
mnelmnel
94k20222233
94k20222233
add a comment |
add a comment |
data.table
does duplicated
by key. From ?duplicated.data.table
:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
bykey
by default, you can specify the by variables
– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
add a comment |
data.table
does duplicated
by key. From ?duplicated.data.table
:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
bykey
by default, you can specify the by variables
– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
add a comment |
data.table
does duplicated
by key. From ?duplicated.data.table
:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
data.table
does duplicated
by key. From ?duplicated.data.table
:
‘duplicated’ returns a logical vector indicating which rows of a
‘data.table’ have duplicate rows (by key).
setkey(dt, x, y)
dt[!duplicated(dt)]
## x y z
## 1: 1 1 1
## 2: 1 2 1
## 3: 2 2 2
answered Jul 23 '14 at 5:55
Jake BurkheadJake Burkhead
5,73721631
5,73721631
bykey
by default, you can specify the by variables
– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
add a comment |
bykey
by default, you can specify the by variables
– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
by
key
by default, you can specify the by variables– mnel
Jul 23 '14 at 5:56
by
key
by default, you can specify the by variables– mnel
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
@mnel yea I upvoted your answer. Just thought this might shed some light on why the behaviour makes sense although it might seem strange
– Jake Burkhead
Jul 23 '14 at 5:56
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
dt[!duplicated(dt[,c("x","y"),with=F])] #seems to work
– akrun
Jul 23 '14 at 6:00
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
Thanks! Upvoted both but chose @mnel's for conciseness
– Max Ghenis
Jul 23 '14 at 6:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f24902737%2fkeep-first-row-by-multiple-columns-in-an-r-data-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown