Created a 'DB optimization' topic, with cross-refs to relevant sections.

Also fixed #10291, which was related, and cleaned up some inconsistent doc labels. git-svn-id: http://code.djangoproject.com/svn/django/trunk@12229 bcc190cf-cafb-0310-a4f2-bffc1f526a37
2010-01-16 03:13:16 +00:00 · 2010-01-16 03:13:16 +00:00 · 2e9518bb39
parent 19fad16414
commit 2e9518bb39
6 changed files with 293 additions and 11 deletions
--- a/docs/faq/models.txt
+++ b/docs/faq/models.txt
@ -3,6 +3,8 @@
 FAQ: Databases and models
 =========================

+.. _faq-see-raw-sql-queries:
+
 How can I see the raw SQL queries Django is running?
 ----------------------------------------------------

--- a/docs/index.txt
+++ b/docs/index.txt
@ -71,7 +71,8 @@ The model layer
    * **Other:**
      :ref:`Supported databases <ref-databases>` |
      :ref:`Legacy databases <howto-legacy-databases>` |
-      :ref:`Providing initial data <howto-initial-data>`
+      :ref:`Providing initial data <howto-initial-data>` |
+      :ref:`Optimize database access <topics-db-optimization>`

 The template layer
 ==================
--- a/docs/ref/models/querysets.txt
+++ b/docs/ref/models/querysets.txt
@ -66,6 +66,18 @@ You can evaluate a ``QuerySet`` in the following ways:
      iterating over a ``QuerySet`` will take advantage of your database to
      load data and instantiate objects only as you need them.

+    * **bool().** Testing a ``QuerySet`` in a boolean context, such as using
+      ``bool()``, ``or``, ``and`` or an ``if`` statement, will cause the query
+      to be executed. If there is at least one result, the ``QuerySet`` is
+      ``True``, otherwise ``False``. For example::
+
+          if Entry.objects.filter(headline="Test"):
+             print "There is at least one Entry with the headline Test"
+
+      Note: *Don't* use this if all you want to do is determine if at least one
+      result exists, and don't need the actual objects. It's more efficient to
+      use ``exists()`` (see below).
+
 .. _pickling QuerySets:

 Pickling QuerySets
@ -302,7 +314,7 @@ a model which defines a default ordering, or when using
 ordering was undefined prior to calling ``reverse()``, and will remain
 undefined afterward).

-.. _querysets-distinct:
+.. _queryset-distinct:

 ``distinct()``
 ~~~~~~~~~~~~~~
@ -336,6 +348,8 @@ query spans multiple tables, it's possible to get duplicate results when a
    ``values()`` call.


+.. _queryset-values:
+
 ``values(*fields)``
 ~~~~~~~~~~~~~~~~~~~

@ -616,7 +630,7 @@ call, since they are conflicting options.
 Both the ``depth`` argument and the ability to specify field names in the call
 to ``select_related()`` are new in Django version 1.0.

-.. _extra:
+.. _queryset-extra:

 ``extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -1062,17 +1076,18 @@ Example::

 If you pass ``in_bulk()`` an empty list, you'll get an empty dictionary.

+.. _queryset-iterator:
+
 ``iterator()``
 ~~~~~~~~~~~~~~

 Evaluates the ``QuerySet`` (by performing the query) and returns an
-`iterator`_ over the results. A ``QuerySet`` typically reads all of
-its results and instantiates all of the corresponding objects the
-first time you access it; ``iterator()`` will instead read results and
-instantiate objects in discrete chunks, yielding them one at a
-time. For a ``QuerySet`` which returns a large number of objects, this
-often results in better performance and a significant reduction in
-memory use.
+`iterator`_ over the results. A ``QuerySet`` typically caches its
+results internally so that repeated evaluations do not result in
+additional queries; ``iterator()`` will instead read results directly,
+without doing any caching at the ``QuerySet`` level. For a
+``QuerySet`` which returns a large number of objects, this often
+results in better performance and a significant reduction in memory

 Note that using ``iterator()`` on a ``QuerySet`` which has already
 been evaluated will force it to evaluate again, repeating the query.
--- a/docs/topics/db/aggregation.txt
+++ b/docs/topics/db/aggregation.txt
@ -353,7 +353,7 @@ without any harmful effects, since that is already playing a role in the
 query.

 This behavior is the same as that noted in the queryset documentation for
-:ref:`distinct() <querysets-distinct>` and the general rule is the same:
+:ref:`distinct() <queryset-distinct>` and the general rule is the same:
 normally you won't want extra columns playing a part in the result, so clear
 out the ordering, or at least make sure it's restricted only to those fields
 you also select in a ``values()`` call.
--- a/docs/topics/db/index.txt
+++ b/docs/topics/db/index.txt
@ -17,3 +17,4 @@ model maps to a single database table.
   sql
   transactions
   multi-db
+   optimization
--- a/docs/topics/db/optimization.txt
+++ b/docs/topics/db/optimization.txt
@ -0,0 +1,263 @@
+.. _topics-db-optimization:
+
+============================
+Database access optimization
+============================
+
+Django's database layer provides various ways to help developers get the most
+out of their databases. This documents gathers together links to the relevant
+documentation, and adds various tips, organized under an number of headings that
+outline the steps to take when attempting to optimize your database usage.
+
+Profile first
+=============
+
+As general programming practice, this goes without saying. Find out :ref:`what
+queries you are doing and what they are costing you
+<faq-see-raw-sql-queries>`. You may also want to use an external project like
+'django-debug-toolbar', or a tool that monitors your database directly.
+
+Remember that you may be optimizing for speed or memory or both, depending on
+your requirements. Sometimes optimizing for one will be detrimental to the
+other, but sometimes they will help each other. Also, work that is done by the
+database process might not have the same cost (to you) as the same amount of
+work done in your Python process. It is up to you to decide what your
+priorities are, where the balance must lie, and profile all of these as required
+since this will depend on your application and server.
+
+With everything that follows, remember to profile after every change to ensure
+that the change is a benefit, and a big enough benefit given the decrease in
+readability of your code. **All** of the suggestions below come with the caveat
+that in your circumstances the general principle might not apply, or might even
+be reversed.
+
+Use standard DB optimization techniques
+=======================================
+
+...including:
+
+* Indexes. This is a number one priority, *after* you have determined from
+  profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add
+  these from Django.
+
+* Appropriate use of field types.
+
+We will assume you have done the obvious things above. The rest of this document
+focuses on how to use Django in such a way that you are not doing unnecessary
+work. This document also does not address other optimization techniques that
+apply to all expensive operations, such as :ref:`general purpose caching
+<topics-cache>`.
+
+Understand QuerySets
+====================
+
+Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good
+performance with simple code. In particular:
+
+Understand QuerySet evaluation
+------------------------------
+
+To avoid performance problems, it is important to understand:
+
+* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
+
+* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
+
+* how :ref:`the data is held in memory <caching-and-querysets>`.
+
+Understand cached attributes
+----------------------------
+
+As well as caching of the whole ``QuerySet``, there is caching of the result of
+attributes on ORM objects. In general, attributes that are not callable will be
+cached. For example, assuming the :ref:`example weblog models
+<queryset-model-example>`:
+
+  >>> entry = Entry.objects.get(id=1)
+  >>> entry.blog   # Blog object is retrieved at this point
+  >>> entry.blog   # cached version, no DB access
+
+But in general, callable attributes cause DB lookups every time::
+
+  >>> entry = Entry.objects.get(id=1)
+  >>> entry.authors.all()   # query performed
+  >>> entry.authors.all()   # query performed again
+
+Be careful when reading template code - the template system does not allow use
+of parentheses, but will call callables automatically, hiding the above
+distinction.
+
+Be careful with your own custom properties - it is up to you to implement
+caching.
+
+Use the ``with`` template tag
+-----------------------------
+
+To make use of the caching behaviour of ``QuerySet``, you may need to use the
+:ttag:`with` template tag.
+
+Use ``iterator()``
+------------------
+
+When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
+cause a large amount of memory to be used. In this case,
+:ref:`QuerySet.iterator() <queryset-iterator>` may help.
+
+Do database work in the database rather than in Python
+======================================================
+
+For instance:
+
+* At the most basic level, use :ref:`filter and exclude <queryset-api>` to
+  filtering in the database to avoid loading data into your Python process, only
+  to throw much of it away.
+
+* Use :ref:`F() object query expressions <query-expressions>` to do filtering
+  against other fields within the same model.
+
+* Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`.
+
+If these aren't enough to generate the SQL you need:
+
+Use ``QuerySet.extra()``
+------------------------
+
+A less portable but more powerful method is :ref:`QuerySet.extra()
+<queryset-extra>`, which allows some SQL to be explicitly added to the query.
+If that still isn't powerful enough:
+
+Use raw SQL
+-----------
+
+Write your own :ref:`custom SQL to retrieve data or populate models
+<topics-db-sql>`. Use ``django.db.connection.queries`` to find out what Django
+is writing for you and start from there.
+
+Retrieve everything at once if you know you will need it
+========================================================
+
+Hitting the database multiple times for different parts of a single 'set' of
+data that you will need all parts of is, in general, less efficient than
+retrieving it all in one query. This is particularly important if you have a
+query that is executed in a loop, and could therefore end up doing many database
+queries, when only one was needed. So:
+
+Use ``QuerySet.select_related()``
+---------------------------------
+
+Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
+
+* in view code,
+
+* and in :ref:`managers and default managers <topics-db-managers>` where
+  appropriate. Be aware when your manager is and is not used; sometimes this is
+  tricky so don't make assumptions.
+
+Don't retrieve things you don't need
+====================================
+
+Use ``QuerySet.values()`` and ``values_list()``
+-----------------------------------------------
+
+When you just want a dict/list of values, and don't need ORM model objects, make
+appropriate usage of :ref:`QuerySet.values() <queryset-values>`.
+These can be useful for replacing model objects in template code - as long as
+the dicts you supply have the same attributes as those used in the template, you
+are fine.
+
+Use ``QuerySet.defer()`` and ``only()``
+---------------------------------------
+
+Use :ref:`defer() and only() <queryset-defer>` if there are database columns you
+know that you won't need (or won't need in most cases) to avoid loading
+them. Note that if you *do* use them, the ORM will have to go and get them in a
+separate query, making this a pessimization if you use it inappropriately.
+
+Use QuerySet.count()
+--------------------
+
+...if you only want the count, rather than doing ``len(queryset)``.
+
+Use QuerySet.exists()
+---------------------
+
+...if you only want to find out if at least one result exists, rather than ``if
+queryset``.
+
+But:
+
+Don't overuse ``count()`` and ``exists()``
+------------------------------------------
+
+If you are going to need other data from the QuerySet, just evaluate it.
+
+For example, assuming an Email class that has a ``body`` attribute and a
+many-to-many relation to User, the following template code is optimal:
+
+.. code-block:: html+django
+
+   {% if display_inbox %}
+     {% with user.emails.all as emails %}
+       {% if emails %}
+         <p>You have {{ emails|length }} email(s)</p>
+         {% for email in emails %}
+           <p>{{ email.body }}</p>
+         {% endfor %}
+       {% else %}
+         <p>No messages today.</p>
+       {% endif %}
+     {% endwith %}
+   {% endif %}
+
+
+It is optimal because:
+
+ 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
+
+ #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
+    later use, allowing its cache to be re-used.
+
+ #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
+    which causes the ``user.emails.all()`` query to be run on the database, and
+    at the least the first line to be turned into an ORM object. If there aren't
+    any results, it will return False, otherwise True.
+
+ #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
+    out the rest of the cache without doing another query.
+
+ #. The ``for`` loop iterates over the already filled cache.
+
+In total, this code does either one or zero database queries. The only
+deliberate optimization performed is the use of the ``with`` tag. Using
+``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
+additional queries.
+
+Use ``QuerySet.update()`` and ``delete()``
+------------------------------------------
+
+Rather than retrieve a load of objects, set some values, and save them
+individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
+<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
+<topics-db-queries-delete>` where possible.
+
+Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()``
+methods of individual instances, which means that any custom behaviour you have
+added for these methods will not be executed, including anything driven from the
+normal database object :ref:`signals <ref-signals>`.
+
+Don't retrieve things you already have
+======================================
+
+Use foreign key values directly
+-------------------------------
+
+If you only need a foreign key value, use the foreign key value that is already on
+the object you've got, rather than getting the whole related object and taking
+its primary key. i.e. do::
+
+   entry.blog_id
+
+instead of::
+
+   entry.blog.id
+