Compare two large tables by various attributes - PostgreSQLPostgreSQL “DESCRIBE TABLE”Show tables in PostgreSQLExport PostgreSQL table to CSV file with headingsModify OWNER on all tables simultaneously in PostgreSQLHow to import CSV file data into a PostgreSQL table?How can I drop all the tables in a PostgreSQL database?List tables in a PostgreSQL schemaDesigning a SQL schema for a combination of many-to-many relationship (variations of products)MYSQL select and ground by maxNeed efficient SQL on joining rowcount from a materialized view with another materialized view
Is floating in space similar to falling under gravity?
Can a Beholder use rays in melee range?
What are the problems in teaching guitar via Skype?
Apparent Ring of Craters on the Moon
Different PCB color ( is it different material? )
In what episode of TOS did a character on the bridge make a comment about raising one to some power?
How to prevent bad sectors?
Why do Russians call their women expensive ("дорогая")?
Terminology about G- simplicial complexes
Infinitely many hats
Is it ok to put a subplot to a story that is never meant to contribute to the development of the main plot?
Can't use numexpr in horizontal mode
Draw a checker pattern with a black X in the center
Were pen cap holes designed to prevent death by suffocation if swallowed?
Probability of fraction not being able to be simplified
The Passive Wisdom (Perception) score of my character on D&D Beyond seems too high
A Mathematical Discussion: Fill in the Blank
What caused the tendency for conservatives to not support climate change reform?
Restoring order in a deck of playing cards
How can I find where certain bash function is defined?
Is there an evolutionary advantage to having two heads?
What is the 中 in ダウンロード中?
Could IPv6 make NAT / port numbers redundant?
How to capture more stars?
Compare two large tables by various attributes - PostgreSQL
PostgreSQL “DESCRIBE TABLE”Show tables in PostgreSQLExport PostgreSQL table to CSV file with headingsModify OWNER on all tables simultaneously in PostgreSQLHow to import CSV file data into a PostgreSQL table?How can I drop all the tables in a PostgreSQL database?List tables in a PostgreSQL schemaDesigning a SQL schema for a combination of many-to-many relationship (variations of products)MYSQL select and ground by maxNeed efficient SQL on joining rowcount from a materialized view with another materialized view
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm having trouble coming up with an efficient query that compares two tables with various attributes. This is for a report for an online retailer who has several hundred thousand SKUs available for sale. Each SKU is a variation of a "parent" product. They sell on various marketplaces and need to see if there are items that are not available for sale in various places.
There is a table with all parent products, and another table with all variations with their corresponding SKU. In a third table they have a complete list of each sku (variation) and it's corresponding marketplace where the combination of sku + marketplace is unique.
Database uses PostgreSQL
Table structures are as follows:
Product Table:
Products
id | parent_sku | vendor_id
-------------------------------
1 | ABC | 100
2 | DEF | 200
3 | XYZ | 100
Variation Table:
Variations
id | parent_id | sku
----------------------------
1 | 1 | ABC-1
2 | 1 | ABC-2
3 | 1 | ABC-3
4 | 2 | DEF-1
5 | 2 | DEF-2
6 | 3 | XYZ-1
7 | 3 | XYZ-2
Marketplace Table:
MarketplaceData
id | sku | marketplace | price
----------------------------
1 | ABC-1 | website1 | 99.99
2 | ABC-2 | website1 | 99.99
3 | ABC-3 | website1 | 89.99
4 | DEF-1 | website1 | 29.99
5 | DEF-2 | website1 | 29.99
6 | XYZ-1 | website1 | 39.99
7 | XYZ-2 | website1 | 39.99
8 | ABC-1 | website2 | 99.99
9 | ABC-2 | website2 | 99.99
10 | ABC-3 | website2 | 99.99
11 | DEF-1 | website2 | 29.99
12 | DEF-2 | website2 | 29.99
13 | XYZ-1 | website2 | 34.99
14 | XYZ-2 | website2 | 34.99
I have a working query, but it takes extremely long to execute and is very taxing.
SELECT DISTINCT parent_id FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2'))
AND sku NOT IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
Since each sku + marketplace dataset has close to 400,000 rows and the MarketplaceData table contains over 2 million rows, this query takes forever to execute.
In terms of indexing, the id column is the primary key for each. The Variations table has an index on sku (must be unique) and the MarketplaceData is indexed on sku + marketplace.
Ultimately, what I need is a list of unique parent_id's that meet the criteria.
Any help or guidance would be greatly appreciated.
Thanks!
sql postgresql
add a comment |
I'm having trouble coming up with an efficient query that compares two tables with various attributes. This is for a report for an online retailer who has several hundred thousand SKUs available for sale. Each SKU is a variation of a "parent" product. They sell on various marketplaces and need to see if there are items that are not available for sale in various places.
There is a table with all parent products, and another table with all variations with their corresponding SKU. In a third table they have a complete list of each sku (variation) and it's corresponding marketplace where the combination of sku + marketplace is unique.
Database uses PostgreSQL
Table structures are as follows:
Product Table:
Products
id | parent_sku | vendor_id
-------------------------------
1 | ABC | 100
2 | DEF | 200
3 | XYZ | 100
Variation Table:
Variations
id | parent_id | sku
----------------------------
1 | 1 | ABC-1
2 | 1 | ABC-2
3 | 1 | ABC-3
4 | 2 | DEF-1
5 | 2 | DEF-2
6 | 3 | XYZ-1
7 | 3 | XYZ-2
Marketplace Table:
MarketplaceData
id | sku | marketplace | price
----------------------------
1 | ABC-1 | website1 | 99.99
2 | ABC-2 | website1 | 99.99
3 | ABC-3 | website1 | 89.99
4 | DEF-1 | website1 | 29.99
5 | DEF-2 | website1 | 29.99
6 | XYZ-1 | website1 | 39.99
7 | XYZ-2 | website1 | 39.99
8 | ABC-1 | website2 | 99.99
9 | ABC-2 | website2 | 99.99
10 | ABC-3 | website2 | 99.99
11 | DEF-1 | website2 | 29.99
12 | DEF-2 | website2 | 29.99
13 | XYZ-1 | website2 | 34.99
14 | XYZ-2 | website2 | 34.99
I have a working query, but it takes extremely long to execute and is very taxing.
SELECT DISTINCT parent_id FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2'))
AND sku NOT IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
Since each sku + marketplace dataset has close to 400,000 rows and the MarketplaceData table contains over 2 million rows, this query takes forever to execute.
In terms of indexing, the id column is the primary key for each. The Variations table has an index on sku (must be unique) and the MarketplaceData is indexed on sku + marketplace.
Ultimately, what I need is a list of unique parent_id's that meet the criteria.
Any help or guidance would be greatly appreciated.
Thanks!
sql postgresql
add a comment |
I'm having trouble coming up with an efficient query that compares two tables with various attributes. This is for a report for an online retailer who has several hundred thousand SKUs available for sale. Each SKU is a variation of a "parent" product. They sell on various marketplaces and need to see if there are items that are not available for sale in various places.
There is a table with all parent products, and another table with all variations with their corresponding SKU. In a third table they have a complete list of each sku (variation) and it's corresponding marketplace where the combination of sku + marketplace is unique.
Database uses PostgreSQL
Table structures are as follows:
Product Table:
Products
id | parent_sku | vendor_id
-------------------------------
1 | ABC | 100
2 | DEF | 200
3 | XYZ | 100
Variation Table:
Variations
id | parent_id | sku
----------------------------
1 | 1 | ABC-1
2 | 1 | ABC-2
3 | 1 | ABC-3
4 | 2 | DEF-1
5 | 2 | DEF-2
6 | 3 | XYZ-1
7 | 3 | XYZ-2
Marketplace Table:
MarketplaceData
id | sku | marketplace | price
----------------------------
1 | ABC-1 | website1 | 99.99
2 | ABC-2 | website1 | 99.99
3 | ABC-3 | website1 | 89.99
4 | DEF-1 | website1 | 29.99
5 | DEF-2 | website1 | 29.99
6 | XYZ-1 | website1 | 39.99
7 | XYZ-2 | website1 | 39.99
8 | ABC-1 | website2 | 99.99
9 | ABC-2 | website2 | 99.99
10 | ABC-3 | website2 | 99.99
11 | DEF-1 | website2 | 29.99
12 | DEF-2 | website2 | 29.99
13 | XYZ-1 | website2 | 34.99
14 | XYZ-2 | website2 | 34.99
I have a working query, but it takes extremely long to execute and is very taxing.
SELECT DISTINCT parent_id FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2'))
AND sku NOT IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
Since each sku + marketplace dataset has close to 400,000 rows and the MarketplaceData table contains over 2 million rows, this query takes forever to execute.
In terms of indexing, the id column is the primary key for each. The Variations table has an index on sku (must be unique) and the MarketplaceData is indexed on sku + marketplace.
Ultimately, what I need is a list of unique parent_id's that meet the criteria.
Any help or guidance would be greatly appreciated.
Thanks!
sql postgresql
I'm having trouble coming up with an efficient query that compares two tables with various attributes. This is for a report for an online retailer who has several hundred thousand SKUs available for sale. Each SKU is a variation of a "parent" product. They sell on various marketplaces and need to see if there are items that are not available for sale in various places.
There is a table with all parent products, and another table with all variations with their corresponding SKU. In a third table they have a complete list of each sku (variation) and it's corresponding marketplace where the combination of sku + marketplace is unique.
Database uses PostgreSQL
Table structures are as follows:
Product Table:
Products
id | parent_sku | vendor_id
-------------------------------
1 | ABC | 100
2 | DEF | 200
3 | XYZ | 100
Variation Table:
Variations
id | parent_id | sku
----------------------------
1 | 1 | ABC-1
2 | 1 | ABC-2
3 | 1 | ABC-3
4 | 2 | DEF-1
5 | 2 | DEF-2
6 | 3 | XYZ-1
7 | 3 | XYZ-2
Marketplace Table:
MarketplaceData
id | sku | marketplace | price
----------------------------
1 | ABC-1 | website1 | 99.99
2 | ABC-2 | website1 | 99.99
3 | ABC-3 | website1 | 89.99
4 | DEF-1 | website1 | 29.99
5 | DEF-2 | website1 | 29.99
6 | XYZ-1 | website1 | 39.99
7 | XYZ-2 | website1 | 39.99
8 | ABC-1 | website2 | 99.99
9 | ABC-2 | website2 | 99.99
10 | ABC-3 | website2 | 99.99
11 | DEF-1 | website2 | 29.99
12 | DEF-2 | website2 | 29.99
13 | XYZ-1 | website2 | 34.99
14 | XYZ-2 | website2 | 34.99
I have a working query, but it takes extremely long to execute and is very taxing.
SELECT DISTINCT parent_id FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2'))
AND sku NOT IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
Since each sku + marketplace dataset has close to 400,000 rows and the MarketplaceData table contains over 2 million rows, this query takes forever to execute.
In terms of indexing, the id column is the primary key for each. The Variations table has an index on sku (must be unique) and the MarketplaceData is indexed on sku + marketplace.
Ultimately, what I need is a list of unique parent_id's that meet the criteria.
Any help or guidance would be greatly appreciated.
Thanks!
sql postgresql
sql postgresql
asked Mar 24 at 8:58
potorikpotorik
335
335
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
instead of IN and NOT In you could use INNER JOIN and LEFT JOIN with checking for null
SELECT DISTINCT v.parent_id
FROM Variations v
INNER JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku
LEFT JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On t2.sku = v.sku
WHERE t2.sku is null
add a comment |
Why if you only use a single subquery?
SELECT DISTINCT parent_id
FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
except
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
add a comment |
How about a simple aggregation to get the skus?
select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;
Then to get the parent ids:
select distinct v.parent_id
from variations v join
(select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
) m
on m.sku = v.sku;
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55322146%2fcompare-two-large-tables-by-various-attributes-postgresql%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
instead of IN and NOT In you could use INNER JOIN and LEFT JOIN with checking for null
SELECT DISTINCT v.parent_id
FROM Variations v
INNER JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku
LEFT JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On t2.sku = v.sku
WHERE t2.sku is null
add a comment |
instead of IN and NOT In you could use INNER JOIN and LEFT JOIN with checking for null
SELECT DISTINCT v.parent_id
FROM Variations v
INNER JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku
LEFT JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On t2.sku = v.sku
WHERE t2.sku is null
add a comment |
instead of IN and NOT In you could use INNER JOIN and LEFT JOIN with checking for null
SELECT DISTINCT v.parent_id
FROM Variations v
INNER JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku
LEFT JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On t2.sku = v.sku
WHERE t2.sku is null
instead of IN and NOT In you could use INNER JOIN and LEFT JOIN with checking for null
SELECT DISTINCT v.parent_id
FROM Variations v
INNER JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku
LEFT JOIN (
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On t2.sku = v.sku
WHERE t2.sku is null
answered Mar 24 at 9:12
scaisEdgescaisEdge
102k105472
102k105472
add a comment |
add a comment |
Why if you only use a single subquery?
SELECT DISTINCT parent_id
FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
except
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
add a comment |
Why if you only use a single subquery?
SELECT DISTINCT parent_id
FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
except
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
add a comment |
Why if you only use a single subquery?
SELECT DISTINCT parent_id
FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
except
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
Why if you only use a single subquery?
SELECT DISTINCT parent_id
FROM Variations
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
except
SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0
answered Mar 24 at 9:52
a_horse_with_no_namea_horse_with_no_name
314k47480584
314k47480584
add a comment |
add a comment |
How about a simple aggregation to get the skus?
select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;
Then to get the parent ids:
select distinct v.parent_id
from variations v join
(select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
) m
on m.sku = v.sku;
add a comment |
How about a simple aggregation to get the skus?
select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;
Then to get the parent ids:
select distinct v.parent_id
from variations v join
(select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
) m
on m.sku = v.sku;
add a comment |
How about a simple aggregation to get the skus?
select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;
Then to get the parent ids:
select distinct v.parent_id
from variations v join
(select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
) m
on m.sku = v.sku;
How about a simple aggregation to get the skus?
select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;
Then to get the parent ids:
select distinct v.parent_id
from variations v join
(select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
) m
on m.sku = v.sku;
answered Mar 24 at 12:36
Gordon LinoffGordon Linoff
814k37327435
814k37327435
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55322146%2fcompare-two-large-tables-by-various-attributes-postgresql%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown