mirror of https://github.com/django/django.git
414 lines
15 KiB
Plaintext
414 lines
15 KiB
Plaintext
==========================
|
|
Serializing Django objects
|
|
==========================
|
|
|
|
Django's serialization framework provides a mechanism for "translating" Django
|
|
objects into other formats. Usually these other formats will be text-based and
|
|
used for sending Django objects over a wire, but it's possible for a
|
|
serializer to handle any format (text-based or not).
|
|
|
|
.. seealso::
|
|
|
|
If you just want to get some data from your tables into a serialized
|
|
form, you could use the :djadmin:`dumpdata` management command.
|
|
|
|
Serializing data
|
|
----------------
|
|
|
|
At the highest level, serializing data is a very simple operation::
|
|
|
|
from django.core import serializers
|
|
data = serializers.serialize("xml", SomeModel.objects.all())
|
|
|
|
The arguments to the ``serialize`` function are the format to serialize the data
|
|
to (see `Serialization formats`_) and a
|
|
:class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second
|
|
argument can be any iterator that yields Django objects, but it'll almost
|
|
always be a QuerySet).
|
|
|
|
You can also use a serializer object directly::
|
|
|
|
XMLSerializer = serializers.get_serializer("xml")
|
|
xml_serializer = XMLSerializer()
|
|
xml_serializer.serialize(queryset)
|
|
data = xml_serializer.getvalue()
|
|
|
|
This is useful if you want to serialize data directly to a file-like object
|
|
(which includes an :class:`~django.http.HttpResponse`)::
|
|
|
|
out = open("file.xml", "w")
|
|
xml_serializer.serialize(SomeModel.objects.all(), stream=out)
|
|
|
|
.. note::
|
|
|
|
Calling :func:`~django.core.serializers.get_serializer` with an unknown
|
|
:ref:`format <serialization-formats>` will raise a
|
|
:class:`~django.core.serializers.SerializerDoesNotExist` exception.
|
|
|
|
Subset of fields
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
If you only want a subset of fields to be serialized, you can
|
|
specify a ``fields`` argument to the serializer::
|
|
|
|
from django.core import serializers
|
|
data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
|
|
|
|
In this example, only the ``name`` and ``size`` attributes of each model will
|
|
be serialized.
|
|
|
|
.. note::
|
|
|
|
Depending on your model, you may find that it is not possible to
|
|
deserialize a model that only serializes a subset of its fields. If a
|
|
serialized object doesn't specify all the fields that are required by a
|
|
model, the deserializer will not be able to save deserialized instances.
|
|
|
|
Inherited Models
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
If you have a model that is defined using an :ref:`abstract base class
|
|
<abstract-base-classes>`, you don't have to do anything special to serialize
|
|
that model. Just call the serializer on the object (or objects) that you want to
|
|
serialize, and the output will be a complete representation of the serialized
|
|
object.
|
|
|
|
However, if you have a model that uses :ref:`multi-table inheritance
|
|
<multi-table-inheritance>`, you also need to serialize all of the base classes
|
|
for the model. This is because only the fields that are locally defined on the
|
|
model will be serialized. For example, consider the following models::
|
|
|
|
class Place(models.Model):
|
|
name = models.CharField(max_length=50)
|
|
|
|
class Restaurant(Place):
|
|
serves_hot_dogs = models.BooleanField()
|
|
|
|
If you only serialize the Restaurant model::
|
|
|
|
data = serializers.serialize('xml', Restaurant.objects.all())
|
|
|
|
the fields on the serialized output will only contain the `serves_hot_dogs`
|
|
attribute. The `name` attribute of the base class will be ignored.
|
|
|
|
In order to fully serialize your Restaurant instances, you will need to
|
|
serialize the Place models as well::
|
|
|
|
all_objects = list(Restaurant.objects.all()) + list(Place.objects.all())
|
|
data = serializers.serialize('xml', all_objects)
|
|
|
|
Deserializing data
|
|
------------------
|
|
|
|
Deserializing data is also a fairly simple operation::
|
|
|
|
for obj in serializers.deserialize("xml", data):
|
|
do_something_with(obj)
|
|
|
|
As you can see, the ``deserialize`` function takes the same format argument as
|
|
``serialize``, a string or stream of data, and returns an iterator.
|
|
|
|
However, here it gets slightly complicated. The objects returned by the
|
|
``deserialize`` iterator *aren't* simple Django objects. Instead, they are
|
|
special ``DeserializedObject`` instances that wrap a created -- but unsaved --
|
|
object and any associated relationship data.
|
|
|
|
Calling ``DeserializedObject.save()`` saves the object to the database.
|
|
|
|
This ensures that deserializing is a non-destructive operation even if the
|
|
data in your serialized representation doesn't match what's currently in the
|
|
database. Usually, working with these ``DeserializedObject`` instances looks
|
|
something like::
|
|
|
|
for deserialized_object in serializers.deserialize("xml", data):
|
|
if object_should_be_saved(deserialized_object):
|
|
deserialized_object.save()
|
|
|
|
In other words, the usual use is to examine the deserialized objects to make
|
|
sure that they are "appropriate" for saving before doing so. Of course, if you
|
|
trust your data source you could just save the object and move on.
|
|
|
|
The Django object itself can be inspected as ``deserialized_object.object``.
|
|
|
|
.. _serialization-formats:
|
|
|
|
Serialization formats
|
|
---------------------
|
|
|
|
Django supports a number of serialization formats, some of which require you
|
|
to install third-party Python modules:
|
|
|
|
========== ==============================================================
|
|
Identifier Information
|
|
========== ==============================================================
|
|
``xml`` Serializes to and from a simple XML dialect.
|
|
|
|
``json`` Serializes to and from JSON_ (using a version of simplejson_
|
|
bundled with Django).
|
|
|
|
``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
|
|
serializer is only available if PyYAML_ is installed.
|
|
========== ==============================================================
|
|
|
|
.. _json: http://json.org/
|
|
.. _simplejson: http://undefined.org/python/#simplejson
|
|
.. _PyYAML: http://www.pyyaml.org/
|
|
|
|
Notes for specific serialization formats
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
json
|
|
^^^^
|
|
|
|
If you're using UTF-8 (or any other non-ASCII encoding) data with the JSON
|
|
serializer, you must pass ``ensure_ascii=False`` as a parameter to the
|
|
``serialize()`` call. Otherwise, the output won't be encoded correctly.
|
|
|
|
For example::
|
|
|
|
json_serializer = serializers.get_serializer("json")()
|
|
json_serializer.serialize(queryset, ensure_ascii=False, stream=response)
|
|
|
|
The Django source code includes the simplejson_ module. However, if you're
|
|
using Python 2.6 or later (which includes a builtin version of the module), Django will
|
|
use the builtin ``json`` module automatically. If you have a system installed
|
|
version that includes the C-based speedup extension, or your system version is
|
|
more recent than the version shipped with Django (currently, 2.0.7), the
|
|
system version will be used instead of the version included with Django.
|
|
|
|
Be aware that if you're serializing using that module directly, not all Django
|
|
output can be passed unmodified to simplejson. In particular, :ref:`lazy
|
|
translation objects <lazy-translations>` need a `special encoder`_ written for
|
|
them. Something like this will work::
|
|
|
|
from django.utils.functional import Promise
|
|
from django.utils.encoding import force_unicode
|
|
|
|
class LazyEncoder(simplejson.JSONEncoder):
|
|
def default(self, obj):
|
|
if isinstance(obj, Promise):
|
|
return force_unicode(obj)
|
|
return super(LazyEncoder, self).default(obj)
|
|
|
|
.. _special encoder: http://svn.red-bean.com/bob/simplejson/tags/simplejson-1.7/docs/index.html
|
|
|
|
.. _topics-serialization-natural-keys:
|
|
|
|
Natural keys
|
|
------------
|
|
|
|
.. versionadded:: 1.2
|
|
|
|
The ability to use natural keys when serializing/deserializing data was
|
|
added in the 1.2 release.
|
|
|
|
The default serialization strategy for foreign keys and many-to-many relations
|
|
is to serialize the value of the primary key(s) of the objects in the relation.
|
|
This strategy works well for most objects, but it can cause difficulty in some
|
|
circumstances.
|
|
|
|
Consider the case of a list of objects that have a foreign key referencing
|
|
:class:`~django.contrib.conttenttypes.models.ContentType`. If you're going to
|
|
serialize an object that refers to a content type, then you need to have a way
|
|
to refer to that content type to begin with. Since ``ContentType`` objects are
|
|
automatically created by Django during the database synchronization process,
|
|
the primary key of a given content type isn't easy to predict; it will
|
|
depend on how and when :djadmin:`syncdb` was executed. This is true for all
|
|
models which automatically generate objects, notably including
|
|
:class:`~django.contrib.auth.models.Permission`.
|
|
|
|
.. warning::
|
|
|
|
You should never include automatically generated objects in a fixture or
|
|
other serialized data. By chance, the primary keys in the fixture
|
|
may match those in the database and loading the fixture will
|
|
have no effect. In the more likely case that they don't match, the fixture
|
|
loading will fail with an :class:`~django.db.IntegrityError`.
|
|
|
|
There is also the matter of convenience. An integer id isn't always
|
|
the most convenient way to refer to an object; sometimes, a
|
|
more natural reference would be helpful.
|
|
|
|
It is for these reasons that Django provides *natural keys*. A natural
|
|
key is a tuple of values that can be used to uniquely identify an
|
|
object instance without using the primary key value.
|
|
|
|
Deserialization of natural keys
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Consider the following two models::
|
|
|
|
from django.db import models
|
|
|
|
class Person(models.Model):
|
|
first_name = models.CharField(max_length=100)
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
birthdate = models.DateField()
|
|
|
|
class Meta:
|
|
unique_together = (('first_name', 'last_name'),)
|
|
|
|
class Book(models.Model):
|
|
name = models.CharField(max_length=100)
|
|
author = models.ForeignKey(Person)
|
|
|
|
Ordinarily, serialized data for ``Book`` would use an integer to refer to
|
|
the author. For example, in JSON, a Book might be serialized as::
|
|
|
|
...
|
|
{
|
|
"pk": 1,
|
|
"model": "store.book",
|
|
"fields": {
|
|
"name": "Mostly Harmless",
|
|
"author": 42
|
|
}
|
|
}
|
|
...
|
|
|
|
This isn't a particularly natural way to refer to an author. It
|
|
requires that you know the primary key value for the author; it also
|
|
requires that this primary key value is stable and predictable.
|
|
|
|
However, if we add natural key handling to Person, the fixture becomes
|
|
much more humane. To add natural key handling, you define a default
|
|
Manager for Person with a ``get_by_natural_key()`` method. In the case
|
|
of a Person, a good natural key might be the pair of first and last
|
|
name::
|
|
|
|
from django.db import models
|
|
|
|
class PersonManager(models.Manager):
|
|
def get_by_natural_key(self, first_name, last_name):
|
|
return self.get(first_name=first_name, last_name=last_name)
|
|
|
|
class Person(models.Model):
|
|
objects = PersonManager()
|
|
|
|
first_name = models.CharField(max_length=100)
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
birthdate = models.DateField()
|
|
|
|
class Meta:
|
|
unique_together = (('first_name', 'last_name'),)
|
|
|
|
Now books can use that natural key to refer to ``Person`` objects::
|
|
|
|
...
|
|
{
|
|
"pk": 1,
|
|
"model": "store.book",
|
|
"fields": {
|
|
"name": "Mostly Harmless",
|
|
"author": ["Douglas", "Adams"]
|
|
}
|
|
}
|
|
...
|
|
|
|
When you try to load this serialized data, Django will use the
|
|
``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
|
|
into the primary key of an actual ``Person`` object.
|
|
|
|
.. note::
|
|
|
|
Whatever fields you use for a natural key must be able to uniquely
|
|
identify an object. This will usually mean that your model will
|
|
have a uniqueness clause (either unique=True on a single field, or
|
|
``unique_together`` over multiple fields) for the field or fields
|
|
in your natural key. However, uniqueness doesn't need to be
|
|
enforced at the database level. If you are certain that a set of
|
|
fields will be effectively unique, you can still use those fields
|
|
as a natural key.
|
|
|
|
Serialization of natural keys
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
So how do you get Django to emit a natural key when serializing an object?
|
|
Firstly, you need to add another method -- this time to the model itself::
|
|
|
|
class Person(models.Model):
|
|
objects = PersonManager()
|
|
|
|
first_name = models.CharField(max_length=100)
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
birthdate = models.DateField()
|
|
|
|
def natural_key(self):
|
|
return (self.first_name, self.last_name)
|
|
|
|
class Meta:
|
|
unique_together = (('first_name', 'last_name'),)
|
|
|
|
That method should always return a natural key tuple -- in this
|
|
example, ``(first name, last name)``. Then, when you call
|
|
``serializers.serialize()``, you provide a ``use_natural_keys=True``
|
|
argument::
|
|
|
|
>>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True)
|
|
|
|
When ``use_natural_keys=True`` is specified, Django will use the
|
|
``natural_key()`` method to serialize any reference to objects of the
|
|
type that defines the method.
|
|
|
|
If you are using :djadmin:`dumpdata` to generate serialized data, you
|
|
use the `--natural` command line flag to generate natural keys.
|
|
|
|
.. note::
|
|
|
|
You don't need to define both ``natural_key()`` and
|
|
``get_by_natural_key()``. If you don't want Django to output
|
|
natural keys during serialization, but you want to retain the
|
|
ability to load natural keys, then you can opt to not implement
|
|
the ``natural_key()`` method.
|
|
|
|
Conversely, if (for some strange reason) you want Django to output
|
|
natural keys during serialization, but *not* be able to load those
|
|
key values, just don't define the ``get_by_natural_key()`` method.
|
|
|
|
Dependencies during serialization
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Since natural keys rely on database lookups to resolve references, it
|
|
is important that the data exists before it is referenced. You can't make
|
|
a `forward reference` with natural keys--the data you are referencing
|
|
must exist before you include a natural key reference to that data.
|
|
|
|
To accommodate this limitation, calls to :djadmin:`dumpdata` that use
|
|
the :djadminopt:`--natural` option will serialize any model with a
|
|
``natural_key()`` method before serializing standard primary key objects.
|
|
|
|
However, this may not always be enough. If your natural key refers to
|
|
another object (by using a foreign key or natural key to another object
|
|
as part of a natural key), then you need to be able to ensure that
|
|
the objects on which a natural key depends occur in the serialized data
|
|
before the natural key requires them.
|
|
|
|
To control this ordering, you can define dependencies on your
|
|
``natural_key()`` methods. You do this by setting a ``dependencies``
|
|
attribute on the ``natural_key()`` method itself.
|
|
|
|
For example, let's add a natural key to the ``Book`` model from the
|
|
example above::
|
|
|
|
class Book(models.Model):
|
|
name = models.CharField(max_length=100)
|
|
author = models.ForeignKey(Person)
|
|
|
|
def natural_key(self):
|
|
return (self.name,) + self.author.natural_key()
|
|
|
|
The natural key for a ``Book`` is a combination of its name and its
|
|
author. This means that ``Person`` must be serialized before ``Book``.
|
|
To define this dependency, we add one extra line::
|
|
|
|
def natural_key(self):
|
|
return (self.name,) + self.author.natural_key()
|
|
natural_key.dependencies = ['example_app.person']
|
|
|
|
This definition ensures that all ``Person`` objects are serialized before
|
|
any ``Book`` objects. In turn, any object referencing ``Book`` will be
|
|
serialized after both ``Person`` and ``Book`` have been serialized.
|