From 2e9518bb396c37c48a0464236b714d313b56f10f Mon Sep 17 00:00:00 2001 From: Luke Plant Date: Sat, 16 Jan 2010 03:13:16 +0000 Subject: [PATCH] Created a 'DB optimization' topic, with cross-refs to relevant sections. Also fixed #10291, which was related, and cleaned up some inconsistent doc labels. git-svn-id: http://code.djangoproject.com/svn/django/trunk@12229 bcc190cf-cafb-0310-a4f2-bffc1f526a37 --- docs/faq/models.txt | 2 + docs/index.txt | 3 +- docs/ref/models/querysets.txt | 33 ++-- docs/topics/db/aggregation.txt | 2 +- docs/topics/db/index.txt | 1 + docs/topics/db/optimization.txt | 263 ++++++++++++++++++++++++++++++++ 6 files changed, 293 insertions(+), 11 deletions(-) create mode 100644 docs/topics/db/optimization.txt diff --git a/docs/faq/models.txt b/docs/faq/models.txt index 1272f96f03..42c7d5bc3c 100644 --- a/docs/faq/models.txt +++ b/docs/faq/models.txt @@ -3,6 +3,8 @@ FAQ: Databases and models ========================= +.. _faq-see-raw-sql-queries: + How can I see the raw SQL queries Django is running? ---------------------------------------------------- diff --git a/docs/index.txt b/docs/index.txt index d5b37512c9..d39dbadd6d 100644 --- a/docs/index.txt +++ b/docs/index.txt @@ -71,7 +71,8 @@ The model layer * **Other:** :ref:`Supported databases ` | :ref:`Legacy databases ` | - :ref:`Providing initial data ` + :ref:`Providing initial data ` | + :ref:`Optimize database access ` The template layer ================== diff --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt index 5c9d33bc83..4740d9ca10 100644 --- a/docs/ref/models/querysets.txt +++ b/docs/ref/models/querysets.txt @@ -66,6 +66,18 @@ You can evaluate a ``QuerySet`` in the following ways: iterating over a ``QuerySet`` will take advantage of your database to load data and instantiate objects only as you need them. + * **bool().** Testing a ``QuerySet`` in a boolean context, such as using + ``bool()``, ``or``, ``and`` or an ``if`` statement, will cause the query + to be executed. If there is at least one result, the ``QuerySet`` is + ``True``, otherwise ``False``. For example:: + + if Entry.objects.filter(headline="Test"): + print "There is at least one Entry with the headline Test" + + Note: *Don't* use this if all you want to do is determine if at least one + result exists, and don't need the actual objects. It's more efficient to + use ``exists()`` (see below). + .. _pickling QuerySets: Pickling QuerySets @@ -302,7 +314,7 @@ a model which defines a default ordering, or when using ordering was undefined prior to calling ``reverse()``, and will remain undefined afterward). -.. _querysets-distinct: +.. _queryset-distinct: ``distinct()`` ~~~~~~~~~~~~~~ @@ -336,6 +348,8 @@ query spans multiple tables, it's possible to get duplicate results when a ``values()`` call. +.. _queryset-values: + ``values(*fields)`` ~~~~~~~~~~~~~~~~~~~ @@ -616,7 +630,7 @@ call, since they are conflicting options. Both the ``depth`` argument and the ability to specify field names in the call to ``select_related()`` are new in Django version 1.0. -.. _extra: +.. _queryset-extra: ``extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1062,17 +1076,18 @@ Example:: If you pass ``in_bulk()`` an empty list, you'll get an empty dictionary. +.. _queryset-iterator: + ``iterator()`` ~~~~~~~~~~~~~~ Evaluates the ``QuerySet`` (by performing the query) and returns an -`iterator`_ over the results. A ``QuerySet`` typically reads all of -its results and instantiates all of the corresponding objects the -first time you access it; ``iterator()`` will instead read results and -instantiate objects in discrete chunks, yielding them one at a -time. For a ``QuerySet`` which returns a large number of objects, this -often results in better performance and a significant reduction in -memory use. +`iterator`_ over the results. A ``QuerySet`` typically caches its +results internally so that repeated evaluations do not result in +additional queries; ``iterator()`` will instead read results directly, +without doing any caching at the ``QuerySet`` level. For a +``QuerySet`` which returns a large number of objects, this often +results in better performance and a significant reduction in memory Note that using ``iterator()`` on a ``QuerySet`` which has already been evaluated will force it to evaluate again, repeating the query. diff --git a/docs/topics/db/aggregation.txt b/docs/topics/db/aggregation.txt index 1c1ce20f12..06194eba27 100644 --- a/docs/topics/db/aggregation.txt +++ b/docs/topics/db/aggregation.txt @@ -353,7 +353,7 @@ without any harmful effects, since that is already playing a role in the query. This behavior is the same as that noted in the queryset documentation for -:ref:`distinct() ` and the general rule is the same: +:ref:`distinct() ` and the general rule is the same: normally you won't want extra columns playing a part in the result, so clear out the ordering, or at least make sure it's restricted only to those fields you also select in a ``values()`` call. diff --git a/docs/topics/db/index.txt b/docs/topics/db/index.txt index bf918eba6b..3eb62b70ca 100644 --- a/docs/topics/db/index.txt +++ b/docs/topics/db/index.txt @@ -17,3 +17,4 @@ model maps to a single database table. sql transactions multi-db + optimization diff --git a/docs/topics/db/optimization.txt b/docs/topics/db/optimization.txt new file mode 100644 index 0000000000..6063bc6c2a --- /dev/null +++ b/docs/topics/db/optimization.txt @@ -0,0 +1,263 @@ +.. _topics-db-optimization: + +============================ +Database access optimization +============================ + +Django's database layer provides various ways to help developers get the most +out of their databases. This documents gathers together links to the relevant +documentation, and adds various tips, organized under an number of headings that +outline the steps to take when attempting to optimize your database usage. + +Profile first +============= + +As general programming practice, this goes without saying. Find out :ref:`what +queries you are doing and what they are costing you +`. You may also want to use an external project like +'django-debug-toolbar', or a tool that monitors your database directly. + +Remember that you may be optimizing for speed or memory or both, depending on +your requirements. Sometimes optimizing for one will be detrimental to the +other, but sometimes they will help each other. Also, work that is done by the +database process might not have the same cost (to you) as the same amount of +work done in your Python process. It is up to you to decide what your +priorities are, where the balance must lie, and profile all of these as required +since this will depend on your application and server. + +With everything that follows, remember to profile after every change to ensure +that the change is a benefit, and a big enough benefit given the decrease in +readability of your code. **All** of the suggestions below come with the caveat +that in your circumstances the general principle might not apply, or might even +be reversed. + +Use standard DB optimization techniques +======================================= + +...including: + +* Indexes. This is a number one priority, *after* you have determined from + profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add + these from Django. + +* Appropriate use of field types. + +We will assume you have done the obvious things above. The rest of this document +focuses on how to use Django in such a way that you are not doing unnecessary +work. This document also does not address other optimization techniques that +apply to all expensive operations, such as :ref:`general purpose caching +`. + +Understand QuerySets +==================== + +Understanding :ref:`QuerySets ` is vital to getting good +performance with simple code. In particular: + +Understand QuerySet evaluation +------------------------------ + +To avoid performance problems, it is important to understand: + +* that :ref:`QuerySets are lazy `. + +* when :ref:`they are evaluated `. + +* how :ref:`the data is held in memory `. + +Understand cached attributes +---------------------------- + +As well as caching of the whole ``QuerySet``, there is caching of the result of +attributes on ORM objects. In general, attributes that are not callable will be +cached. For example, assuming the :ref:`example weblog models +`: + + >>> entry = Entry.objects.get(id=1) + >>> entry.blog # Blog object is retrieved at this point + >>> entry.blog # cached version, no DB access + +But in general, callable attributes cause DB lookups every time:: + + >>> entry = Entry.objects.get(id=1) + >>> entry.authors.all() # query performed + >>> entry.authors.all() # query performed again + +Be careful when reading template code - the template system does not allow use +of parentheses, but will call callables automatically, hiding the above +distinction. + +Be careful with your own custom properties - it is up to you to implement +caching. + +Use the ``with`` template tag +----------------------------- + +To make use of the caching behaviour of ``QuerySet``, you may need to use the +:ttag:`with` template tag. + +Use ``iterator()`` +------------------ + +When you have a lot of objects, the caching behaviour of the ``QuerySet`` can +cause a large amount of memory to be used. In this case, +:ref:`QuerySet.iterator() ` may help. + +Do database work in the database rather than in Python +====================================================== + +For instance: + +* At the most basic level, use :ref:`filter and exclude ` to + filtering in the database to avoid loading data into your Python process, only + to throw much of it away. + +* Use :ref:`F() object query expressions ` to do filtering + against other fields within the same model. + +* Use :ref:`annotate to do aggregation in the database `. + +If these aren't enough to generate the SQL you need: + +Use ``QuerySet.extra()`` +------------------------ + +A less portable but more powerful method is :ref:`QuerySet.extra() +`, which allows some SQL to be explicitly added to the query. +If that still isn't powerful enough: + +Use raw SQL +----------- + +Write your own :ref:`custom SQL to retrieve data or populate models +`. Use ``django.db.connection.queries`` to find out what Django +is writing for you and start from there. + +Retrieve everything at once if you know you will need it +======================================================== + +Hitting the database multiple times for different parts of a single 'set' of +data that you will need all parts of is, in general, less efficient than +retrieving it all in one query. This is particularly important if you have a +query that is executed in a loop, and could therefore end up doing many database +queries, when only one was needed. So: + +Use ``QuerySet.select_related()`` +--------------------------------- + +Understand :ref:`QuerySet.select_related() ` thoroughly, and use it: + +* in view code, + +* and in :ref:`managers and default managers ` where + appropriate. Be aware when your manager is and is not used; sometimes this is + tricky so don't make assumptions. + +Don't retrieve things you don't need +==================================== + +Use ``QuerySet.values()`` and ``values_list()`` +----------------------------------------------- + +When you just want a dict/list of values, and don't need ORM model objects, make +appropriate usage of :ref:`QuerySet.values() `. +These can be useful for replacing model objects in template code - as long as +the dicts you supply have the same attributes as those used in the template, you +are fine. + +Use ``QuerySet.defer()`` and ``only()`` +--------------------------------------- + +Use :ref:`defer() and only() ` if there are database columns you +know that you won't need (or won't need in most cases) to avoid loading +them. Note that if you *do* use them, the ORM will have to go and get them in a +separate query, making this a pessimization if you use it inappropriately. + +Use QuerySet.count() +-------------------- + +...if you only want the count, rather than doing ``len(queryset)``. + +Use QuerySet.exists() +--------------------- + +...if you only want to find out if at least one result exists, rather than ``if +queryset``. + +But: + +Don't overuse ``count()`` and ``exists()`` +------------------------------------------ + +If you are going to need other data from the QuerySet, just evaluate it. + +For example, assuming an Email class that has a ``body`` attribute and a +many-to-many relation to User, the following template code is optimal: + +.. code-block:: html+django + + {% if display_inbox %} + {% with user.emails.all as emails %} + {% if emails %} +

You have {{ emails|length }} email(s)

+ {% for email in emails %} +

{{ email.body }}

+ {% endfor %} + {% else %} +

No messages today.

+ {% endif %} + {% endwith %} + {% endif %} + + +It is optimal because: + + 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False. + + #. Use of ``with`` means that we store ``user.emails.all`` in a variable for + later use, allowing its cache to be re-used. + + #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called, + which causes the ``user.emails.all()`` query to be run on the database, and + at the least the first line to be turned into an ORM object. If there aren't + any results, it will return False, otherwise True. + + #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling + out the rest of the cache without doing another query. + + #. The ``for`` loop iterates over the already filled cache. + +In total, this code does either one or zero database queries. The only +deliberate optimization performed is the use of the ``with`` tag. Using +``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause +additional queries. + +Use ``QuerySet.update()`` and ``delete()`` +------------------------------------------ + +Rather than retrieve a load of objects, set some values, and save them +individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update() +`. Similarly, do :ref:`bulk deletes +` where possible. + +Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()`` +methods of individual instances, which means that any custom behaviour you have +added for these methods will not be executed, including anything driven from the +normal database object :ref:`signals `. + +Don't retrieve things you already have +====================================== + +Use foreign key values directly +------------------------------- + +If you only need a foreign key value, use the foreign key value that is already on +the object you've got, rather than getting the whole related object and taking +its primary key. i.e. do:: + + entry.blog_id + +instead of:: + + entry.blog.id +