Fixed #10045 -- Corrected docs about .annotate()/.filter() ordering.

Thanks Josh, Anssi, and Carl for reviews and advice.
This commit is contained in:
Tim Graham 2015-10-29 18:43:53 -04:00
parent 8c553e7d3f
commit 91a431f48c
1 changed files with 58 additions and 18 deletions

View File

@ -184,6 +184,8 @@ of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be
modified using any other ``QuerySet`` operation, including ``filter()``, modified using any other ``QuerySet`` operation, including ``filter()``,
``order_by()``, or even additional calls to ``annotate()``. ``order_by()``, or even additional calls to ``annotate()``.
.. _combining-multiple-aggregations:
Combining multiple aggregations Combining multiple aggregations
------------------------------- -------------------------------
@ -340,29 +342,67 @@ Order of ``annotate()`` and ``filter()`` clauses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When developing a complex query that involves both ``annotate()`` and When developing a complex query that involves both ``annotate()`` and
``filter()`` clauses, particular attention should be paid to the order ``filter()`` clauses, pay particular attention to the order in which the
in which the clauses are applied to the ``QuerySet``. clauses are applied to the ``QuerySet``.
When an ``annotate()`` clause is applied to a query, the annotation is When an ``annotate()`` clause is applied to a query, the annotation is computed
computed over the state of the query up to the point where the annotation over the state of the query up to the point where the annotation is requested.
is requested. The practical implication of this is that ``filter()`` and The practical implication of this is that ``filter()`` and ``annotate()`` are
``annotate()`` are not commutative operations -- that is, there is a not commutative operations.
difference between the query::
>>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0) Given:
and the query:: * Publisher A has two books with ratings 4 and 5.
* Publisher B has two books with ratings 1 and 4.
* Publisher C has one book with rating 1.
>>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book')) Here's an example with the ``Count`` aggregate::
Both queries will return a list of publishers that have at least one good >>> a, b = Publisher.objects.annotate(num_books=Count('book', distinct=True)).filter(book__rating__gt=3.0)
book (i.e., a book with a rating exceeding 3.0). However, the annotation in >>> a, a.num_books
the first query will provide the total number of all books published by the (<Publisher: A>, 2)
publisher; the second query will only include good books in the annotated >>> b, b.num_books
count. In the first query, the annotation precedes the filter, so the (<Publisher: B>, 2)
filter has no effect on the annotation. In the second query, the filter
precedes the annotation, and as a result, the filter constrains the objects >>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book'))
considered when calculating the annotation. >>> a, a.num_books
(<Publisher: A>, 2)
>>> b, b.num_books
(<Publisher: B>, 1)
Both queries return a list of publishers that have at least one book with a
rating exceeding 3.0, hence publisher C is excluded.
In the first query, the annotation precedes the filter, so the filter has no
effect on the annotation. ``distinct=True`` is required to avoid a
:ref:`cross-join bug <combining-multiple-aggregations>`.
The second query counts the number of books that have a rating exceeding 3.0
for each publisher. The filter precedes the annotation, so the filter
constrains the objects considered when calculating the annotation.
Here's another example with the ``Avg`` aggregate::
>>> a, b = Publisher.objects.annotate(avg_rating=Avg('book__rating')).filter(book__rating__gt=3.0)
>>> a, a.avg_rating
(<Publisher: A>, 4.5) # (5+4)/2
>>> b, b.avg_rating
(<Publisher: B>, 2.5) # (1+4)/2
>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(avg_rating=Avg('book__rating'))
>>> a, a.avg_rating
(<Publisher: A>, 4.5) # (5+4)/2
>>> b, b.avg_rating
(<Publisher: B>, 4.0) # 4/1 (book with rating 1 excluded)
The first query asks for the average rating of all a publisher's books for
publisher's that have at least one book with a rating exceeding 3.0. The second
query asks for the average of a publisher's book's ratings for only those
ratings exceeding 3.0.
It's difficult to intuit how the ORM will translate complex querysets into SQL
queries so when in doubt, inspect the SQL with ``str(queryset.query)`` and
write plenty of tests.
``order_by()`` ``order_by()``
-------------- --------------