Django Q Queries & on the same field?Multiple ManyToMany 'in' query with DjangoHow to combine 2 or more querysets in a Django view?Inserting multiple rows in a single SQL query?Does Django scale?django - inlineformset_factory with more than one ForeignKeyHow to query MongoDB with “like”?writing a django query and get reverse related objects in one hit of database!django most efficient way to count same field values in a queryRadio buttons in django adminfilter json data from Django modelHow to expose some specific fields of model_b based on a field of model_a?
Do things made of adamantine rust?
As an employer, can I compel my employees to vote?
Debussy as term for bathroom?
Can someone explain to me the parameters of a lognormal distribution?
How do rulers get rich from war?
Asking an expert in your field that you have never met to review your manuscript
How does one calculate the distribution of the Matt Colville way of rolling stats?
Escape the labyrinth!
Is Zack Morris's 'time stop' ability in "Saved By the Bell" a supernatural ability?
How to fix folder structure in Windows 7 and 10
Did Apollo carry and use WD40?
As a discovery writer, how do I complete an unfinished novel (which has highly diverged from the original plot ) after a time-gap?
US entry with tourist visa but past alcohol arrest
What do these pins mean? Where should I plug them in?
Why are some of the Stunts in The Expanse RPG labelled 'Core'?
Nanomachines exist that enable Axolotl-levels of regeneration - So how can crippling injuries exist as well?
Norwegian refuses EU delay (4.7 hours) compensation because it turned out there was nothing wrong with the aircraft
Is it possible that the shadow of The Moon is a single dot during solar eclipse?
Circle divided by lines between a blue dots
Can Northern Ireland's border issue be solved by repartition?
How is the problem, ⟨G⟩ in Logspace?
How would a native speaker correct themselves when they misspeak?
Social leper versus social leopard
Did HaShem ever command a Navi (Prophet) to break a law?
Django Q Queries & on the same field?
Multiple ManyToMany 'in' query with DjangoHow to combine 2 or more querysets in a Django view?Inserting multiple rows in a single SQL query?Does Django scale?django - inlineformset_factory with more than one ForeignKeyHow to query MongoDB with “like”?writing a django query and get reverse related objects in one hit of database!django most efficient way to count same field values in a queryRadio buttons in django adminfilter json data from Django modelHow to expose some specific fields of model_b based on a field of model_a?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event
is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios
and some with platform=android
, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event
instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
sql django django-queryset django-q
add a comment
|
So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event
is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios
and some with platform=android
, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event
instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
sql django django-queryset django-q
The second query looks fine, try adding.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model'sMeta
can sabotage you in subtle ways.
– Endre Both
Mar 29 at 9:16
I added a.order_by('user_id')
with the same results. What does work is to add.distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.
– Chase Roberts
Mar 30 at 12:03
I assume you already have an index onplatform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses theEvent
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).
– Endre Both
Apr 1 at 8:07
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31
add a comment
|
So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event
is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios
and some with platform=android
, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event
instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
sql django django-queryset django-q
So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event
is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios
and some with platform=android
, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event
instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
sql django django-queryset django-q
sql django django-queryset django-q
asked Mar 28 at 14:57
Chase RobertsChase Roberts
5,0586 gold badges55 silver badges104 bronze badges
5,0586 gold badges55 silver badges104 bronze badges
The second query looks fine, try adding.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model'sMeta
can sabotage you in subtle ways.
– Endre Both
Mar 29 at 9:16
I added a.order_by('user_id')
with the same results. What does work is to add.distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.
– Chase Roberts
Mar 30 at 12:03
I assume you already have an index onplatform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses theEvent
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).
– Endre Both
Apr 1 at 8:07
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31
add a comment
|
The second query looks fine, try adding.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model'sMeta
can sabotage you in subtle ways.
– Endre Both
Mar 29 at 9:16
I added a.order_by('user_id')
with the same results. What does work is to add.distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.
– Chase Roberts
Mar 30 at 12:03
I assume you already have an index onplatform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses theEvent
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).
– Endre Both
Apr 1 at 8:07
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31
The second query looks fine, try adding
.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model's Meta
can sabotage you in subtle ways.– Endre Both
Mar 29 at 9:16
The second query looks fine, try adding
.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model's Meta
can sabotage you in subtle ways.– Endre Both
Mar 29 at 9:16
I added a
.order_by('user_id')
with the same results. What does work is to add .distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.– Chase Roberts
Mar 30 at 12:03
I added a
.order_by('user_id')
with the same results. What does work is to add .distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.– Chase Roberts
Mar 30 at 12:03
I assume you already have an index on
platform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses the Event
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).– Endre Both
Apr 1 at 8:07
I assume you already have an index on
platform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses the Event
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).– Endre Both
Apr 1 at 8:07
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31
add a comment
|
2 Answers
2
active
oldest
votes
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. CallingUser.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use.distinct()
, which seems contradictory to what the docs say. And then chaining it with another.filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.
– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
add a comment
|
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
- The function
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.) - The temporary variable
base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually. - One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_
prefix and "
.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400746%2fdjango-q-queries-on-the-same-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. CallingUser.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use.distinct()
, which seems contradictory to what the docs say. And then chaining it with another.filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.
– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
add a comment
|
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. CallingUser.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use.distinct()
, which seems contradictory to what the docs say. And then chaining it with another.filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.
– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
add a comment
|
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
You can use annotations instead:
django.db.models import Count
User.objects.all().annotate(events_count=Count('event')).filter(events_count=2)
So it will filter out any user that has two events.
You can also use chained filters:
User.objects.filter(event__platform='android').filter(event__platform='ios')
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
edited Mar 29 at 8:47
Endre Both
3,3561 gold badge14 silver badges23 bronze badges
3,3561 gold badge14 silver badges23 bronze badges
answered Mar 28 at 15:10
Navid ZarepakNavid Zarepak
2,0501 gold badge6 silver badges19 bronze badges
2,0501 gold badge6 silver badges19 bronze badges
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. CallingUser.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use.distinct()
, which seems contradictory to what the docs say. And then chaining it with another.filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.
– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
add a comment
|
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. CallingUser.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use.distinct()
, which seems contradictory to what the docs say. And then chaining it with another.filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.
– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
The first query annotates the counts per user. The second works fine, apologies for my earlier comment. It seems identical to the asker's first query, but it is not when not filtering x-to-many relationships like here.
– Endre Both
Mar 29 at 8:45
I shouldn't be picky, but I don't particularly like this answer. Calling
User.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use .distinct()
, which seems contradictory to what the docs say. And then chaining it with another .filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.– Chase Roberts
Mar 30 at 11:41
I shouldn't be picky, but I don't particularly like this answer. Calling
User.objects.filter(event__platform='android')
causes a join and returns >6M results, when my question is why don't I get 39k results. I think the answer is that I need to use .distinct()
, which seems contradictory to what the docs say. And then chaining it with another .filter()
is going to cause another join which doesn't exactly return quickly when you have a table with >6M rows.– Chase Roberts
Mar 30 at 11:41
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
You're going to filter your results based on another table so ofc you will have joins. it's how you designed your database and this is how a database works. you can get ids for android and ios and use them which is faster than filtering by strings. you can also do the second filter on some distinced queryset which is faster. How to do optimization on these kind of queries is something beyond this question and you can open another question and start discussing that but this is the answer for your question using django orm unless you want to use raw sql which still another topic about sql.
– Navid Zarepak
Mar 30 at 12:01
add a comment
|
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
- The function
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.) - The temporary variable
base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually. - One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_
prefix and "
.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
add a comment
|
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
- The function
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.) - The temporary variable
base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually. - One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_
prefix and "
.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
add a comment
|
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
- The function
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.) - The temporary variable
base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually. - One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_
prefix and "
.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
base_subq = Event.objects.values('user_id').order_by().distinct()
user_qs = User.objects.filter(
Q(pk__in=base_subq.filter(platform="android")) &
Q(pk__in=base_subq.filter(platform="ios"))
)
The method .order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).
Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user WHERE (
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'android')
AND
user.id IN (SELECT DISTINCT U0.user_id FROM event U0 WHERE U0.platform = 'ios')
)
- The function
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.) - The temporary variable
base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually. - One join between User (parent) and Event (child) wouldn't be a problem and a solution with one subquery is also possible, but a join with Event and Event (a join with a repeated children object or with two children objects) should by avoided by a subquery in any case. Two subqueries are nice for readability to demonstrate the symmetry of the two filter conditions.
Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
ios_user_ids = (Event.objects.filter(platform="ios")
.values('user_id').order_by().distinct())
user_ids = (Event.objects.filter(platform="android", user_id__in=ios_user_ids)
.values('user_id').order_by().distinct())
user_qs = User.objects.filter(pk__in=user_ids)
Verify how it is compiled to SQL: (simplified again by removing app_
prefix and "
.)
>>> print(str(user_qs.query))
SELECT user.id, user.email FROM user
WHERE user.id IN (
SELECT DISTINCT V0.user_id FROM event V0
WHERE V0.platform = 'ios' AND V0.user_id IN (
SELECT DISTINCT U0.user_id FROM event U0
WHERE U0.platform = 'android'
)
)
(These solutions work also in an old Django e.g. 1.8. A special subquery function Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)
edited Apr 4 at 17:50
answered Apr 4 at 0:37
hynekcerhynekcer
9,5772 gold badges43 silver badges75 bronze badges
9,5772 gold badges43 silver badges75 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400746%2fdjango-q-queries-on-the-same-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The second query looks fine, try adding
.order_by()
before count to remove any default ordering and see if it works then. Default ordering defined in a model'sMeta
can sabotage you in subtle ways.– Endre Both
Mar 29 at 9:16
I added a
.order_by('user_id')
with the same results. What does work is to add.distinct('id')
, although the query still takes a very long time. My guess is that I'm not going to get it faster without flattening my database structure.– Chase Roberts
Mar 30 at 12:03
I assume you already have an index on
platform
. Using integers instead of strings as suggested by Navid also helps. Finally, raw SQL that doesn't use joins but accesses theEvent
table only should speed up your queries by up to two orders of magnitude (while still not producing instant results for a table of this size).– Endre Both
Apr 1 at 8:07
2 orders of magnitude should be perfect.
– Chase Roberts
Apr 3 at 13:31