Added documentation to explain the gains and losses when using utf8_bin

collation in MySQL. This should help people to make a reasonably informed
decision. Usually, leaving the MySQL collation alone will be the best solution,
but if you must change it, this gives a start to the information you need and
pointers to the appropriate place in the MySQL docs.

There's a small chance I also got all the necessary Sphinx markup correct, too
(it builds without errors, but I may have missed some chances for glory and
linkage).

Fixed #2335, #8506.


git-svn-id: http://code.djangoproject.com/svn/django/trunk@8568 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
Malcolm Tredinnick 2008-08-26 01:59:25 +00:00
parent b2c2c3a2ed
commit f2b389b354
3 changed files with 81 additions and 10 deletions

View File

@ -95,6 +95,65 @@ This ensures all tables and columns will use UTF-8 by default.
.. _create your database: http://dev.mysql.com/doc/refman/5.0/en/create-database.html
.. _mysql-collation:
Collation settings
~~~~~~~~~~~~~~~~~~
The collation setting for a column controls the order in which data is sorted
as well as what strings compare as equal. It can be set on a database-wide
level and also per-table and per-column. This is `documented thoroughly`_ in
the MySQL documentation. In all cases, you set the collation by directly
manipulating the database tables; Django doesn't provide a way to set this on
the model definition.
.. _documented thoroughly: http://dev.mysql.com/doc/refman/5.0/en/charset.html
By default, with a UTF-8 database, MySQL will use the
``utf8_general_ci_swedish`` collation. This results in all string equality
comparisons being done in a *case-insensitive* manner. That is, ``"Fred"`` and
``"freD"`` are considered equal at the database level. If you have a unique
constraint on a field, it would be illegal to try to insert both ``"aa"`` and
``"AA"`` into the same column, since they compare as equal (and, hence,
non-unique) with the default collation.
In many cases, this default will not be a problem. However, if you really want
case-sensitive comparisons on a particular column or table, you would change
the column or table to use the ``utf8_bin`` collation. The main thing to be
aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return
bytestrings (instead of unicode strings) for any character fields it returns
receive from the database. This is a strong variation from Django's normal
practice of *always* returning unicode strings. It is up to you, the
developer, to handle the fact that you will receive bytestrings if you
configure your table(s) to use ``utf8_bin`` collation. Django itself should work
smoothly with such columns, but if your code must be prepared to call
``django.utils.encoding.smart_unicode()`` at times if it really wants to work
with consistent data -- Django will not do this for you (the database backend
layer and the model population layer are separated internally so the database
layer doesn't know it needs to make this conversion in this one particular
case).
If you're using MySQLdb 1.2.1p2, Django's standard
:class:`~django.db.models.CharField` class will return unicode strings even
with ``utf8_bin`` collation. However, :class:`~django.db.models.TextField`
fields will be returned as an ``array.array`` instance (from Python's standard
``array`` module). There isn't a lot Django can do about that, since, again,
the information needed to make the necessary conversions isn't available when
the data is read in from the database. This problem was `fixed in MySQLdb
1.2.2`_, so if you want to use :class:`~django.db.models.TextField` with
``utf8_bin`` collation, upgrading to version 1.2.2 and then dealing with the
bytestrings (which shouldn't be too difficult) is the recommended solution.
Should you decide to use ``utf8_bin`` collation for some of your tables with
MySQLdb 1.2.1p2, you should still use ``utf8_collation_ci_swedish`` (the
default) collation for the :class:`django.contrib.sessions.models.Session`
table (usually called ``django_session`` and the table
:class:`django.contrib.admin.models.LogEntry` table (usually called
``django_admin_log``). Those are the two standard tables that use
:class:`~django.db.model.TextField` internally.
.. _fixed in MySQLdb 1.2.2: http://sourceforge.net/tracker/index.php?func=detail&aid=1495765&group_id=22307&atid=374932
Connecting to the database
--------------------------

View File

@ -340,6 +340,14 @@ The admin represents this as an ``<input type="text">`` (a single-line input).
The maximum length (in characters) of the field. The max_length is enforced
at the database level and in Django's validation.
.. admonition:: MySQL users
If you are using this field with MySQLdb 1.2.2 and the ``utf8_bin``
collation (which is *not* the default), there are some issues to be aware
of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
details.
``CommaSeparatedIntegerField``
------------------------------
@ -689,6 +697,13 @@ Like an :class:`IntegerField`, but only allows values under a certain
A large text field. The admin represents this as a ``<textarea>`` (a multi-line
input).
.. admonition:: MySQL users
If you are using this field with MySQLdb 1.2.1p2 and the ``utf8_bin``
collation (which is *not* the default), there are some issues to be aware
of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
details.
``TimeField``
-------------

View File

@ -729,16 +729,13 @@ anything. It has now been changed to behave the same as ``id__isnull=True``.
.. admonition:: MySQL comparisons
In MySQL, whether or not ``exact`` comparisons are case-sensitive depends
upon the collation setting of the table involved. The default is usually
``latin1_swedish_ci`` or ``utf8_swedish_ci``, which results in
case-insensitive comparisons. Change the collation to
``latin1_swedish_cs`` or ``utf8_bin`` for case sensitive comparisons.
For more details, refer to the MySQL manual section about `character sets
and collations`_.
.. _character sets and collations: http://dev.mysql.com/doc/refman/5.0/en/charset.html
In MySQL, whether or not ``exact`` comparisons are case-insensitive by
default. This is controlled by the collation setting on the database
tables (this is a database setting, *not* a Django setting). It is
possible to configured you MySQL tables to use case-sensitive comparisons,
however there are some trade-offs involved. For more information about
this, see the :ref:`collation section <mysql-collation>` in the
:ref:`databases <ref-databases>` documentation.
iexact
~~~~~~