Added documentation to explain the gains and losses when using utf8_bin
collation in MySQL. This should help people to make a reasonably informed decision. Usually, leaving the MySQL collation alone will be the best solution, but if you must change it, this gives a start to the information you need and pointers to the appropriate place in the MySQL docs. There's a small chance I also got all the necessary Sphinx markup correct, too (it builds without errors, but I may have missed some chances for glory and linkage). Fixed #2335, #8506. git-svn-id: http://code.djangoproject.com/svn/django/trunk@8568 bcc190cf-cafb-0310-a4f2-bffc1f526a37
This commit is contained in:
parent
b2c2c3a2ed
commit
f2b389b354
|
@ -95,6 +95,65 @@ This ensures all tables and columns will use UTF-8 by default.
|
||||||
|
|
||||||
.. _create your database: http://dev.mysql.com/doc/refman/5.0/en/create-database.html
|
.. _create your database: http://dev.mysql.com/doc/refman/5.0/en/create-database.html
|
||||||
|
|
||||||
|
.. _mysql-collation:
|
||||||
|
|
||||||
|
Collation settings
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The collation setting for a column controls the order in which data is sorted
|
||||||
|
as well as what strings compare as equal. It can be set on a database-wide
|
||||||
|
level and also per-table and per-column. This is `documented thoroughly`_ in
|
||||||
|
the MySQL documentation. In all cases, you set the collation by directly
|
||||||
|
manipulating the database tables; Django doesn't provide a way to set this on
|
||||||
|
the model definition.
|
||||||
|
|
||||||
|
.. _documented thoroughly: http://dev.mysql.com/doc/refman/5.0/en/charset.html
|
||||||
|
|
||||||
|
By default, with a UTF-8 database, MySQL will use the
|
||||||
|
``utf8_general_ci_swedish`` collation. This results in all string equality
|
||||||
|
comparisons being done in a *case-insensitive* manner. That is, ``"Fred"`` and
|
||||||
|
``"freD"`` are considered equal at the database level. If you have a unique
|
||||||
|
constraint on a field, it would be illegal to try to insert both ``"aa"`` and
|
||||||
|
``"AA"`` into the same column, since they compare as equal (and, hence,
|
||||||
|
non-unique) with the default collation.
|
||||||
|
|
||||||
|
In many cases, this default will not be a problem. However, if you really want
|
||||||
|
case-sensitive comparisons on a particular column or table, you would change
|
||||||
|
the column or table to use the ``utf8_bin`` collation. The main thing to be
|
||||||
|
aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return
|
||||||
|
bytestrings (instead of unicode strings) for any character fields it returns
|
||||||
|
receive from the database. This is a strong variation from Django's normal
|
||||||
|
practice of *always* returning unicode strings. It is up to you, the
|
||||||
|
developer, to handle the fact that you will receive bytestrings if you
|
||||||
|
configure your table(s) to use ``utf8_bin`` collation. Django itself should work
|
||||||
|
smoothly with such columns, but if your code must be prepared to call
|
||||||
|
``django.utils.encoding.smart_unicode()`` at times if it really wants to work
|
||||||
|
with consistent data -- Django will not do this for you (the database backend
|
||||||
|
layer and the model population layer are separated internally so the database
|
||||||
|
layer doesn't know it needs to make this conversion in this one particular
|
||||||
|
case).
|
||||||
|
|
||||||
|
If you're using MySQLdb 1.2.1p2, Django's standard
|
||||||
|
:class:`~django.db.models.CharField` class will return unicode strings even
|
||||||
|
with ``utf8_bin`` collation. However, :class:`~django.db.models.TextField`
|
||||||
|
fields will be returned as an ``array.array`` instance (from Python's standard
|
||||||
|
``array`` module). There isn't a lot Django can do about that, since, again,
|
||||||
|
the information needed to make the necessary conversions isn't available when
|
||||||
|
the data is read in from the database. This problem was `fixed in MySQLdb
|
||||||
|
1.2.2`_, so if you want to use :class:`~django.db.models.TextField` with
|
||||||
|
``utf8_bin`` collation, upgrading to version 1.2.2 and then dealing with the
|
||||||
|
bytestrings (which shouldn't be too difficult) is the recommended solution.
|
||||||
|
|
||||||
|
Should you decide to use ``utf8_bin`` collation for some of your tables with
|
||||||
|
MySQLdb 1.2.1p2, you should still use ``utf8_collation_ci_swedish`` (the
|
||||||
|
default) collation for the :class:`django.contrib.sessions.models.Session`
|
||||||
|
table (usually called ``django_session`` and the table
|
||||||
|
:class:`django.contrib.admin.models.LogEntry` table (usually called
|
||||||
|
``django_admin_log``). Those are the two standard tables that use
|
||||||
|
:class:`~django.db.model.TextField` internally.
|
||||||
|
|
||||||
|
.. _fixed in MySQLdb 1.2.2: http://sourceforge.net/tracker/index.php?func=detail&aid=1495765&group_id=22307&atid=374932
|
||||||
|
|
||||||
Connecting to the database
|
Connecting to the database
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
|
|
|
@ -340,6 +340,14 @@ The admin represents this as an ``<input type="text">`` (a single-line input).
|
||||||
The maximum length (in characters) of the field. The max_length is enforced
|
The maximum length (in characters) of the field. The max_length is enforced
|
||||||
at the database level and in Django's validation.
|
at the database level and in Django's validation.
|
||||||
|
|
||||||
|
.. admonition:: MySQL users
|
||||||
|
|
||||||
|
If you are using this field with MySQLdb 1.2.2 and the ``utf8_bin``
|
||||||
|
collation (which is *not* the default), there are some issues to be aware
|
||||||
|
of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
|
||||||
|
details.
|
||||||
|
|
||||||
|
|
||||||
``CommaSeparatedIntegerField``
|
``CommaSeparatedIntegerField``
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
||||||
|
@ -689,6 +697,13 @@ Like an :class:`IntegerField`, but only allows values under a certain
|
||||||
A large text field. The admin represents this as a ``<textarea>`` (a multi-line
|
A large text field. The admin represents this as a ``<textarea>`` (a multi-line
|
||||||
input).
|
input).
|
||||||
|
|
||||||
|
.. admonition:: MySQL users
|
||||||
|
|
||||||
|
If you are using this field with MySQLdb 1.2.1p2 and the ``utf8_bin``
|
||||||
|
collation (which is *not* the default), there are some issues to be aware
|
||||||
|
of. Refer to the :ref:`MySQL database notes <mysql-collation>` for
|
||||||
|
details.
|
||||||
|
|
||||||
``TimeField``
|
``TimeField``
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
|
|
@ -729,16 +729,13 @@ anything. It has now been changed to behave the same as ``id__isnull=True``.
|
||||||
|
|
||||||
.. admonition:: MySQL comparisons
|
.. admonition:: MySQL comparisons
|
||||||
|
|
||||||
In MySQL, whether or not ``exact`` comparisons are case-sensitive depends
|
In MySQL, whether or not ``exact`` comparisons are case-insensitive by
|
||||||
upon the collation setting of the table involved. The default is usually
|
default. This is controlled by the collation setting on the database
|
||||||
``latin1_swedish_ci`` or ``utf8_swedish_ci``, which results in
|
tables (this is a database setting, *not* a Django setting). It is
|
||||||
case-insensitive comparisons. Change the collation to
|
possible to configured you MySQL tables to use case-sensitive comparisons,
|
||||||
``latin1_swedish_cs`` or ``utf8_bin`` for case sensitive comparisons.
|
however there are some trade-offs involved. For more information about
|
||||||
|
this, see the :ref:`collation section <mysql-collation>` in the
|
||||||
For more details, refer to the MySQL manual section about `character sets
|
:ref:`databases <ref-databases>` documentation.
|
||||||
and collations`_.
|
|
||||||
|
|
||||||
.. _character sets and collations: http://dev.mysql.com/doc/refman/5.0/en/charset.html
|
|
||||||
|
|
||||||
iexact
|
iexact
|
||||||
~~~~~~
|
~~~~~~
|
||||||
|
|
Loading…
Reference in New Issue