Removing true duplicates from greenplum tableHow can I remove duplicate rows?Add a column with a default value to an existing table in SQL ServerHow do you remove duplicates from a list whilst preserving order?How do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableRemoving duplicates in listsRemove duplicate values from JS arrayResult of one query into another queryHow to populate the data from different schema and different table to other schema to other tableMySQL create index sum of two columns data in one table

The No-Free-Lunch Theorem and K-NN consistency

Disambiguation of "nobis vobis" and "nobis nobis"

Compelling story with the world as a villain

How do I prevent other wifi networks from showing up on my computer?

How would a Creature that needs to be seen by Humans evolve?

Very slow boot time and poor perfomance

“T” in subscript in formulas

Are the A380 engines interchangeable (given they are not all equipped with reverse)?

Can RMSE and MAE have the same value?

How do proponents of Sola Scriptura address the ministry of those Apostles who authored no parts of Scripture?

How do we calculate energy of food?

How to determine car loan length as a function of how long I plan to keep a car

What is the difference between Major and Minor Bug?

Prevent use of CNAME Record for Untrusted Domain

Network helper class with retry logic on failure

Why do all fields in a QFT transform like *irreducible* representations of some group?

Lost property on Portuguese trains

Is gzip atomic?

Tex Quotes(UVa 272)

Duplicate instruments in unison in an orchestra

What verb is かまされる?

Did anyone try to find the little box that held Professor Moriarty and his wife after the crash?

Can I get temporary health insurance while moving to the US?

Did a flight controller ever answer Flight with a no-go?

Removing true duplicates from greenplum table

How can I remove duplicate rows?Add a column with a default value to an existing table in SQL ServerHow do you remove duplicates from a list whilst preserving order?How do I UPDATE from a SELECT in SQL Server?Finding duplicate values in a SQL tableRemoving duplicates in listsRemove duplicate values from JS arrayResult of one query into another queryHow to populate the data from different schema and different table to other schema to other tableMySQL create index sum of two columns data in one table

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I am trying to remove true duplicates from a table. I have removed dupes multiple times in past but I'm not able to figure what's wrong with my syntax with this one.

My code -

DELETE 
FROM my_table_name 
WHERE ( 
 column1, column2, column3, column4, column5, column6, column7, column8, column9) IN
 ( 
 SELECT Row_number() OVER( partition BY column1, column2,column3, column4,column5,column6,column7,column8 ORDER BY column2 DESC, column3 ASC ) AS row_num,
 column1, 
 column2, 
 column3, 
 column4, 
 column5, 
 column6, 
 column7, 
 column8, 
 column9 
 FROM my_table_name 
 WHERE column1='some_value') a 
WHERE row_num=2;

Error

********** Error **********

ERROR: syntax error at or near ""a""
SQL state: 42601
Character: 1607

I can see that the error is on creating the alias a subquery. But I'm not able to pin point what's wrong here.

Any help is appreciated

Edit 1 -
If I remove a, I get the below error

********** Error **********

ERROR: syntax error at or near "where"
SQL state: 42601
Character: 1608

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

Try removing the 'a' alias, you are not even using it.

– Michael Muryn
Mar 27 at 18:36

I'm not familiar with greenplum tables at all (I'm specifically T-SQL), but help me understand this: why do you have 3 where clauses for 2 queries - albeit nested? If I were in T-SQL, I'd probably suggest changing the 3rd 'Where' statement to an 'AND' filter.

– Tiny Haitian
Mar 27 at 19:11

I tried with 'and' instead of 'where' in the last line as well. No help. The reason I have last 'where' clause is because I cannot have row_num filter inside the subquery 'a' because it's a function and not column name.

– Pirate X
Mar 27 at 19:15

add a comment |

I am trying to remove true duplicates from a table. I have removed dupes multiple times in past but I'm not able to figure what's wrong with my syntax with this one.

My code -

DELETE 
FROM my_table_name 
WHERE ( 
 column1, column2, column3, column4, column5, column6, column7, column8, column9) IN
 ( 
 SELECT Row_number() OVER( partition BY column1, column2,column3, column4,column5,column6,column7,column8 ORDER BY column2 DESC, column3 ASC ) AS row_num,
 column1, 
 column2, 
 column3, 
 column4, 
 column5, 
 column6, 
 column7, 
 column8, 
 column9 
 FROM my_table_name 
 WHERE column1='some_value') a 
WHERE row_num=2;

Error

********** Error **********

ERROR: syntax error at or near ""a""
SQL state: 42601
Character: 1607

I can see that the error is on creating the alias a subquery. But I'm not able to pin point what's wrong here.

Any help is appreciated

Edit 1 -
If I remove a, I get the below error

********** Error **********

ERROR: syntax error at or near "where"
SQL state: 42601
Character: 1608

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

Try removing the 'a' alias, you are not even using it.

– Michael Muryn
Mar 27 at 18:36

I'm not familiar with greenplum tables at all (I'm specifically T-SQL), but help me understand this: why do you have 3 where clauses for 2 queries - albeit nested? If I were in T-SQL, I'd probably suggest changing the 3rd 'Where' statement to an 'AND' filter.

– Tiny Haitian
Mar 27 at 19:11

I tried with 'and' instead of 'where' in the last line as well. No help. The reason I have last 'where' clause is because I cannot have row_num filter inside the subquery 'a' because it's a function and not column name.

– Pirate X
Mar 27 at 19:15

add a comment |

I am trying to remove true duplicates from a table. I have removed dupes multiple times in past but I'm not able to figure what's wrong with my syntax with this one.

My code -

DELETE 
FROM my_table_name 
WHERE ( 
 column1, column2, column3, column4, column5, column6, column7, column8, column9) IN
 ( 
 SELECT Row_number() OVER( partition BY column1, column2,column3, column4,column5,column6,column7,column8 ORDER BY column2 DESC, column3 ASC ) AS row_num,
 column1, 
 column2, 
 column3, 
 column4, 
 column5, 
 column6, 
 column7, 
 column8, 
 column9 
 FROM my_table_name 
 WHERE column1='some_value') a 
WHERE row_num=2;

Error

********** Error **********

ERROR: syntax error at or near ""a""
SQL state: 42601
Character: 1607

I can see that the error is on creating the alias a subquery. But I'm not able to pin point what's wrong here.

Any help is appreciated

Edit 1 -
If I remove a, I get the below error

********** Error **********

ERROR: syntax error at or near "where"
SQL state: 42601
Character: 1608

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

I am trying to remove true duplicates from a table. I have removed dupes multiple times in past but I'm not able to figure what's wrong with my syntax with this one.

My code -

DELETE 
FROM my_table_name 
WHERE ( 
 column1, column2, column3, column4, column5, column6, column7, column8, column9) IN
 ( 
 SELECT Row_number() OVER( partition BY column1, column2,column3, column4,column5,column6,column7,column8 ORDER BY column2 DESC, column3 ASC ) AS row_num,
 column1, 
 column2, 
 column3, 
 column4, 
 column5, 
 column6, 
 column7, 
 column8, 
 column9 
 FROM my_table_name 
 WHERE column1='some_value') a 
WHERE row_num=2;

Error

********** Error **********

ERROR: syntax error at or near ""a""
SQL state: 42601
Character: 1607

I can see that the error is on creating the alias a subquery. But I'm not able to pin point what's wrong here.

Any help is appreciated

Edit 1 -
If I remove a, I get the below error

********** Error **********

ERROR: syntax error at or near "where"
SQL state: 42601
Character: 1608

sql duplicates greenplum

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

edited Mar 27 at 18:42

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

asked Mar 27 at 18:32

Pirate X

1,7663 gold badges20 silver badges36 bronze badges

Try removing the 'a' alias, you are not even using it.

– Michael Muryn
Mar 27 at 18:36

I'm not familiar with greenplum tables at all (I'm specifically T-SQL), but help me understand this: why do you have 3 where clauses for 2 queries - albeit nested? If I were in T-SQL, I'd probably suggest changing the 3rd 'Where' statement to an 'AND' filter.

– Tiny Haitian
Mar 27 at 19:11

I tried with 'and' instead of 'where' in the last line as well. No help. The reason I have last 'where' clause is because I cannot have row_num filter inside the subquery 'a' because it's a function and not column name.

– Pirate X
Mar 27 at 19:15

add a comment |

Try removing the 'a' alias, you are not even using it.

– Michael Muryn
Mar 27 at 18:36

I'm not familiar with greenplum tables at all (I'm specifically T-SQL), but help me understand this: why do you have 3 where clauses for 2 queries - albeit nested? If I were in T-SQL, I'd probably suggest changing the 3rd 'Where' statement to an 'AND' filter.

– Tiny Haitian
Mar 27 at 19:11

I tried with 'and' instead of 'where' in the last line as well. No help. The reason I have last 'where' clause is because I cannot have row_num filter inside the subquery 'a' because it's a function and not column name.

– Pirate X
Mar 27 at 19:15

Try removing the 'a' alias, you are not even using it.

– Michael Muryn
Mar 27 at 18:36

I'm not familiar with greenplum tables at all (I'm specifically T-SQL), but help me understand this: why do you have 3 where clauses for 2 queries - albeit nested? If I were in T-SQL, I'd probably suggest changing the 3rd 'Where' statement to an 'AND' filter.

– Tiny Haitian
Mar 27 at 19:11

I tried with 'and' instead of 'where' in the last line as well. No help. The reason I have last 'where' clause is because I cannot have row_num filter inside the subquery 'a' because it's a function and not column name.

– Pirate X
Mar 27 at 19:15

add a comment |

1 Answer
1

active

oldest

votes

If you have duplicate rows, you can't just delete all but one of the records in a single command. You have to delete all duplicates and then insert just one version for each duplicate row or build new table (preferred) without duplicates.

Let's start with the preferred method which is to create a new table without the duplicates. This solution utilizes disk space in the most efficient way possible rather than having a fragmented table.

Example:

create table foo
(id int, fname text)
with (appendonly=true)
distributed by (id);

Insert some data with duplicates:

insert into foo values (1, 'jon');
insert into foo values (1, 'jon');
insert into foo values (2, 'bill');
insert into foo values (2, 'bill');
insert into foo values (3, 'sue');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');

Create a new version of the table without the duplicates:

create table foo_new with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by id) as row_num, id, fname
 from foo
 ) as sub
where sub.row_num = 1
distributed by (id);

And now rename the tables:

alter table foo rename to foo_old;
alter table foo_new rename to foo;

The second method is to use DELETE but you'll see that it needs more steps to complete.

First, create a temp table with the IDs you want to delete. You typically don't have primary keys enforced in Greenplum but you still have a logical PK. Columns like customer_id, product_id, etc are all in your data. So, find the dups first based on the PK.

drop table if exists foo_pk_delete;
create temporary table foo_pk_delete with (appendonly=true) as
select id
from foo
group by id
having count(*) > 1
distributed by (id);

Next, get the entire row for each duplicate but only one version of it.

drop table if exists foo_dedup;
create temporary table foo_dedup with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by f.id) as row_num, f.id, f.fname
 from foo f 
 join foo_pk_delete fd on f.id = fd.id
 ) as sub
where sub.row_num = 1
distributed by (id);

Now you can delete the duplicates:

delete 
from foo f
using foo_pk_delete fk 
where f.id = fk.id;

And then you can insert the deduplicated data back into the table.

insert into foo (id, fname)
select id, fname from foo_dedup;

You'll want to vacuum your table after this data manipulation.

vacuum foo;

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55384282%2fremoving-true-duplicates-from-greenplum-table%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Example:

create table foo
(id int, fname text)
with (appendonly=true)
distributed by (id);

Insert some data with duplicates:

insert into foo values (1, 'jon');
insert into foo values (1, 'jon');
insert into foo values (2, 'bill');
insert into foo values (2, 'bill');
insert into foo values (3, 'sue');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');

Create a new version of the table without the duplicates:

create table foo_new with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by id) as row_num, id, fname
 from foo
 ) as sub
where sub.row_num = 1
distributed by (id);

And now rename the tables:

alter table foo rename to foo_old;
alter table foo_new rename to foo;

The second method is to use DELETE but you'll see that it needs more steps to complete.

drop table if exists foo_pk_delete;
create temporary table foo_pk_delete with (appendonly=true) as
select id
from foo
group by id
having count(*) > 1
distributed by (id);

Next, get the entire row for each duplicate but only one version of it.

drop table if exists foo_dedup;
create temporary table foo_dedup with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by f.id) as row_num, f.id, f.fname
 from foo f 
 join foo_pk_delete fd on f.id = fd.id
 ) as sub
where sub.row_num = 1
distributed by (id);

Now you can delete the duplicates:

delete 
from foo f
using foo_pk_delete fk 
where f.id = fk.id;

And then you can insert the deduplicated data back into the table.

insert into foo (id, fname)
select id, fname from foo_dedup;

You'll want to vacuum your table after this data manipulation.

vacuum foo;

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

add a comment |

Example:

create table foo
(id int, fname text)
with (appendonly=true)
distributed by (id);

Insert some data with duplicates:

insert into foo values (1, 'jon');
insert into foo values (1, 'jon');
insert into foo values (2, 'bill');
insert into foo values (2, 'bill');
insert into foo values (3, 'sue');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');

Create a new version of the table without the duplicates:

create table foo_new with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by id) as row_num, id, fname
 from foo
 ) as sub
where sub.row_num = 1
distributed by (id);

And now rename the tables:

alter table foo rename to foo_old;
alter table foo_new rename to foo;

The second method is to use DELETE but you'll see that it needs more steps to complete.

drop table if exists foo_pk_delete;
create temporary table foo_pk_delete with (appendonly=true) as
select id
from foo
group by id
having count(*) > 1
distributed by (id);

Next, get the entire row for each duplicate but only one version of it.

drop table if exists foo_dedup;
create temporary table foo_dedup with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by f.id) as row_num, f.id, f.fname
 from foo f 
 join foo_pk_delete fd on f.id = fd.id
 ) as sub
where sub.row_num = 1
distributed by (id);

Now you can delete the duplicates:

delete 
from foo f
using foo_pk_delete fk 
where f.id = fk.id;

And then you can insert the deduplicated data back into the table.

insert into foo (id, fname)
select id, fname from foo_dedup;

You'll want to vacuum your table after this data manipulation.

vacuum foo;

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

add a comment |

Example:

create table foo
(id int, fname text)
with (appendonly=true)
distributed by (id);

Insert some data with duplicates:

insert into foo values (1, 'jon');
insert into foo values (1, 'jon');
insert into foo values (2, 'bill');
insert into foo values (2, 'bill');
insert into foo values (3, 'sue');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');

Create a new version of the table without the duplicates:

create table foo_new with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by id) as row_num, id, fname
 from foo
 ) as sub
where sub.row_num = 1
distributed by (id);

And now rename the tables:

alter table foo rename to foo_old;
alter table foo_new rename to foo;

The second method is to use DELETE but you'll see that it needs more steps to complete.

drop table if exists foo_pk_delete;
create temporary table foo_pk_delete with (appendonly=true) as
select id
from foo
group by id
having count(*) > 1
distributed by (id);

Next, get the entire row for each duplicate but only one version of it.

drop table if exists foo_dedup;
create temporary table foo_dedup with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by f.id) as row_num, f.id, f.fname
 from foo f 
 join foo_pk_delete fd on f.id = fd.id
 ) as sub
where sub.row_num = 1
distributed by (id);

Now you can delete the duplicates:

delete 
from foo f
using foo_pk_delete fk 
where f.id = fk.id;

And then you can insert the deduplicated data back into the table.

insert into foo (id, fname)
select id, fname from foo_dedup;

You'll want to vacuum your table after this data manipulation.

vacuum foo;

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

Example:

create table foo
(id int, fname text)
with (appendonly=true)
distributed by (id);

Insert some data with duplicates:

insert into foo values (1, 'jon');
insert into foo values (1, 'jon');
insert into foo values (2, 'bill');
insert into foo values (2, 'bill');
insert into foo values (3, 'sue');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');
insert into foo values (4, 'ted');

Create a new version of the table without the duplicates:

create table foo_new with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by id) as row_num, id, fname
 from foo
 ) as sub
where sub.row_num = 1
distributed by (id);

And now rename the tables:

alter table foo rename to foo_old;
alter table foo_new rename to foo;

The second method is to use DELETE but you'll see that it needs more steps to complete.

drop table if exists foo_pk_delete;
create temporary table foo_pk_delete with (appendonly=true) as
select id
from foo
group by id
having count(*) > 1
distributed by (id);

Next, get the entire row for each duplicate but only one version of it.

drop table if exists foo_dedup;
create temporary table foo_dedup with (appendonly=true) as
select id, fname
from (
 select row_number() over (partition by f.id) as row_num, f.id, f.fname
 from foo f 
 join foo_pk_delete fd on f.id = fd.id
 ) as sub
where sub.row_num = 1
distributed by (id);

Now you can delete the duplicates:

delete 
from foo f
using foo_pk_delete fk 
where f.id = fk.id;

And then you can insert the deduplicated data back into the table.

insert into foo (id, fname)
select id, fname from foo_dedup;

You'll want to vacuum your table after this data manipulation.

vacuum foo;

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

answered Mar 27 at 19:47

Jon Roberts

1,5684 silver badges8 bronze badges

add a comment |

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1