[1.6.x] Fixed #20842 and #20845 - Added a note on order_by() and improved prefetch_related() docs.

Backport of e8183a8193 from master
This commit is contained in:
Daniele Procida 2013-08-02 14:26:19 +01:00 committed by Tim Graham
parent 77293f9354
commit 74205c4a3c
1 changed files with 66 additions and 35 deletions

View File

@ -110,7 +110,7 @@ described here.
.. admonition:: You can't share pickles between versions .. admonition:: You can't share pickles between versions
Pickles of QuerySets are only valid for the version of Django that Pickles of ``QuerySets`` are only valid for the version of Django that
was used to generate them. If you generate a pickle using Django was used to generate them. If you generate a pickle using Django
version N, there is no guarantee that pickle will be readable with version N, there is no guarantee that pickle will be readable with
Django version N+1. Pickles should not be used as part of a long-term Django version N+1. Pickles should not be used as part of a long-term
@ -300,14 +300,30 @@ Be cautious when ordering by fields in related models if you are also using
:meth:`distinct()`. See the note in :meth:`distinct` for an explanation of how :meth:`distinct()`. See the note in :meth:`distinct` for an explanation of how
related model ordering can change the expected results. related model ordering can change the expected results.
It is permissible to specify a multi-valued field to order the results by (for .. note::
example, a :class:`~django.db.models.ManyToManyField` field). Normally It is permissible to specify a multi-valued field to order the results by
this won't be a sensible thing to do and it's really an advanced usage (for example, a :class:`~django.db.models.ManyToManyField` field, or the
feature. However, if you know that your queryset's filtering or available data reverse relation of a :class:`~django.db.models.ForeignKey` field).
implies that there will only be one ordering piece of data for each of the main
items you are selecting, the ordering may well be exactly what you want to do. Consider this case::
Use ordering on multi-valued fields with care and make sure the results are
what you expect. class Event(Model):
parent = models.ForeignKey('self', related_name='children')
date = models.DateField()
Event.objects.order_by('children__date')
Here, there could potentially be multiple ordering data for each ``Event``;
each ``Event`` with multiple ``children`` will be returned multiple times
into the new ``QuerySet`` that ``order_by()`` creates. In other words,
using ``order_by()`` on the ``QuerySet`` could return more items than you
were working on to begin with - which is probably neither expected nor
useful.
Thus, take care when using multi-valued field to order the results. **If**
you can be sure that there will only be one ordering piece of data for each
of the items you're ordering, this approach should not present problems. If
not, make sure the results are what you expect.
There's no way to specify whether ordering should be case sensitive. With There's no way to specify whether ordering should be case sensitive. With
respect to case-sensitivity, Django will order results however your database respect to case-sensitivity, Django will order results however your database
@ -388,7 +404,7 @@ field names, the database will only compare the specified field names.
.. note:: .. note::
When you specify field names, you *must* provide an ``order_by()`` in the When you specify field names, you *must* provide an ``order_by()`` in the
QuerySet, and the fields in ``order_by()`` must start with the fields in ``QuerySet``, and the fields in ``order_by()`` must start with the fields in
``distinct()``, in the same order. ``distinct()``, in the same order.
For example, ``SELECT DISTINCT ON (a)`` gives you the first row for each For example, ``SELECT DISTINCT ON (a)`` gives you the first row for each
@ -805,8 +821,8 @@ stop the deluge of database queries that is caused by accessing related objects,
but the strategy is quite different. but the strategy is quite different.
``select_related`` works by creating a SQL join and including the fields of the ``select_related`` works by creating a SQL join and including the fields of the
related object in the SELECT statement. For this reason, ``select_related`` gets related object in the ``SELECT`` statement. For this reason, ``select_related``
the related objects in the same database query. However, to avoid the much gets the related objects in the same database query. However, to avoid the much
larger result set that would result from joining across a 'many' relationship, larger result set that would result from joining across a 'many' relationship,
``select_related`` is limited to single-valued relationships - foreign key and ``select_related`` is limited to single-valued relationships - foreign key and
one-to-one. one-to-one.
@ -835,39 +851,54 @@ For example, suppose you have these models::
return u"%s (%s)" % (self.name, u", ".join([topping.name return u"%s (%s)" % (self.name, u", ".join([topping.name
for topping in self.toppings.all()])) for topping in self.toppings.all()]))
and run this code:: and run::
>>> Pizza.objects.all() >>> Pizza.objects.all()
[u"Hawaiian (ham, pineapple)", u"Seafood (prawns, smoked salmon)"... [u"Hawaiian (ham, pineapple)", u"Seafood (prawns, smoked salmon)"...
The problem with this code is that it will run a query on the Toppings table for The problem with this is that every time ``Pizza.__unicode__()`` asks for
**every** item in the Pizza ``QuerySet``. Using ``prefetch_related``, this can ``self.toppings.all()`` it has to query the database, so
be reduced to two: ``Pizza.objects.all()`` will run a query on the Toppings table for **every**
item in the Pizza ``QuerySet``.
We can reduce to just two queries using ``prefetch_related``:
>>> Pizza.objects.all().prefetch_related('toppings') >>> Pizza.objects.all().prefetch_related('toppings')
All the relevant toppings will be fetched in a single query, and used to make This implies a ``self.toppings.all()`` for each ``Pizza``; now each time
``QuerySets`` that have a pre-filled cache of the relevant results. These ``self.toppings.all()`` is called, instead of having to go to the database for
``QuerySets`` are then used in the ``self.toppings.all()`` calls. the items, it will find them in a prefetched ``QuerySet`` cache that was
populated in a single query.
The additional queries are executed after the QuerySet has begun to be evaluated That is, all the relevant toppings will have been fetched in a single query,
and the primary query has been executed. Note that the result cache of the and used to make ``QuerySets`` that have a pre-filled cache of the relevant
primary QuerySet and all specified related objects will then be fully loaded results; these ``QuerySets`` are then used in the ``self.toppings.all()`` calls.
into memory, which is often avoided in other cases - even after a query has been
executed in the database, QuerySet normally tries to make uses of chunking
between the database to avoid loading all objects into memory before you need
them.
Also remember that, as always with QuerySets, any subsequent chained methods The additional queries in ``prefetch_related()`` are executed after the
which imply a different database query will ignore previously cached results, ``QuerySet`` has begun to be evaluated and the primary query has been executed.
and retrieve data using a fresh database query. So, if you write the following:
Note that the result cache of the primary ``QuerySet`` and all specified related
objects will then be fully loaded into memory. This changes the typical
behavior of ``QuerySets``, which normally try to avoid loading all objects into
memory before they are needed, even after a query has been executed in the
database.
.. note::
Remember that, as always with ``QuerySets``, any subsequent chained methods
which imply a different database query will ignore previously cached
results, and retrieve data using a fresh database query. So, if you write
the following:
>>> pizzas = Pizza.objects.prefetch_related('toppings') >>> pizzas = Pizza.objects.prefetch_related('toppings')
>>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas] >>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]
...then the fact that ``pizza.toppings.all()`` has been prefetched will not help ...then the fact that ``pizza.toppings.all()`` has been prefetched will not
you - in fact it hurts performance, since you have done a database query that help you. The ``prefetch_related('toppings')`` implied
you haven't used. So use this feature with caution! ``pizza.toppings.all()``, but ``pizza.toppings.filter()`` is a new and
different query. The prefetched cache can't help here; in fact it hurts
performance, since you have done a database query that you haven't used. So
use this feature with caution!
You can also use the normal join syntax to do related fields of related You can also use the normal join syntax to do related fields of related
fields. Suppose we have an additional model to the example above:: fields. Suppose we have an additional model to the example above::
@ -920,7 +951,7 @@ additional queries on the ``ContentType`` table if the relevant rows have not
already been fetched. already been fetched.
``prefetch_related`` in most cases will be implemented using a SQL query that ``prefetch_related`` in most cases will be implemented using a SQL query that
uses the 'IN' operator. This means that for a large QuerySet a large 'IN' clause uses the 'IN' operator. This means that for a large ``QuerySet`` a large 'IN' clause
could be generated, which, depending on the database, might have performance could be generated, which, depending on the database, might have performance
problems of its own when it comes to parsing or executing the SQL query. Always problems of its own when it comes to parsing or executing the SQL query. Always
profile for your use case! profile for your use case!