Please help me I've been stuck on this for way too long :(
I have these two models:
class Specialization(models.Model):
name = models.CharField("name", max_length=64)
class Doctor(models.Model):
name = models.CharField("name", max_length=128)
# ...
specialization = models.ForeignKey(Specialization)
I would like to annotate all specializations in a queryset with the number of doctors that have this specialization.
I went through a loop and I made a simple: Doctor.objects.filter(specialization=spec).count()
however this proved to be too slow and inefficient.
The more I've read the more I realized that it would make sense to use a SubQuery
here to filter the doctors for the OuterRef
specialization. This is what I came up with:
doctors = Doctor.objects.all().filter(specialization=OuterRef("id")) \
.values("specialization_id") \
.order_by()
add_doctors_count = doctors.annotate(cnt=Count("specialization_id")).values("cnt")[:1]
spec_qs_with_counts = Specialization.objects.all().annotate(
num_applicable_doctors=Subquery(add_doctors_count, output_field=IntegerField())
)
The output I get is just 1 for every speciality. The code just annotates every doctor object with its specialization_id
and then annotates the count within that group, meaning it will be 1.
This doesn't make complete sense to me unfortunately. In my initial attempt I used an aggregate for the count, and while it works on its own it doesn't work as a SubQuery
, I get this error:
This queryset contains a reference to an outer query and may only be used in a subquery.
I posted this question before and someone suggested doing Specialization.objects.annotate(count=Count("doctor"))
However this doesn't work because I need to count a specific queryset of Doctors.
However, I'm not getting the same result:
https://docs.djangoproject.com/en/1.11/ref/models/expressions/
https://medium.com/@hansonkd/the-dramatic-benefits-of-django-subqueries-and-annotations-4195e0dafb16
If you have any questions that would make this clearer please tell me.
Doctor
s per Specialization
I think you make things overly complicated, probably because you think that Count('doctor')
will count every doctor per specialization (regardless the specialization of that doctor). It does not, if you Count
such related object, Django implicitly looks for related objects. In fact you can not Count('unrelated_model')
at all, it is only through relations (reversed included) like a ForeignKey
, ManyToManyField
, etc. that you can query these, since otherwise these are not very sensical.
I would like to annotate all specializations in a queryset with the number of doctors that have this specialization.
You can do this with a simple:
# Counting all doctors per specialization (so not all doctors in general)
from django.db.models import Count
Specialization.objects.annotate(
num_doctors=Count('doctor')
)
Now every Specialization
object in this queryset will have an extra attribute num_doctors
that is an integer (the number of doctors with that specialization).
You can also filter on the Specialization
s in the same query (for example only obtain specializations that end on 'my'
). As long as you do not filter on the related doctor
s set, the Count
will work (see section below how to do this).
If you however filter on the related doctor
s, then the related counts will filter out these doctors. Furthermore if you filter on another related object, then this will result in an extra JOIN
, that will act as a multiplier for the Count
s. In that case it might be better to use num_doctors=Count('doctor', distinct=True)
instead. You can always use the distinct=True
(regardless if you do extra JOIN
s or not), but it will have a small performance impact.
The above works because Count('doctor')
does not simply adds all doctors to the query, it makes a LEFT OUTER JOIN
on the doctor
s table, and thus checks that the specialization_id
of that Doctor
is exactly the one we are looking for. So the query Django will construct looks like:
SELECT specialization.*
COUNT(doctor.id) AS num_doctors
FROM specialization
LEFT OUTER JOIN doctor ON doctor.specialization_id = specialization.id
GROUP BY specialization.id
Doing the same with a subquery will functionally get the same results, but if the Django ORM and the database management system do not find a way to optimize this, this can result in an expensive query, since for every specialization, it then can result in an extra subquery in the database.
Doctor
s per Specialization
Say however you want to count only doctors that have a name that starts with Joe, then you can add a filter on the related doctor
, like:
# counting all Doctors with as name Joe per specialization
from django.db.models import Count
Specialization.objects.filter(
doctor__name__startswith='Joe' # sample filter
).annotate(
num_doctors=Count('doctor')
)