401 lines
16 KiB
Plaintext
401 lines
16 KiB
Plaintext
============
|
|
File Uploads
|
|
============
|
|
|
|
.. currentmodule:: django.core.files.uploadedfile
|
|
|
|
When Django handles a file upload, the file data ends up placed in
|
|
:attr:`request.FILES <django.http.HttpRequest.FILES>` (for more on the
|
|
``request`` object see the documentation for :doc:`request and response objects
|
|
</ref/request-response>`). This document explains how files are stored on disk
|
|
and in memory, and how to customize the default behavior.
|
|
|
|
Basic file uploads
|
|
==================
|
|
|
|
Consider a simple form containing a :class:`~django.forms.FileField`::
|
|
|
|
from django import forms
|
|
|
|
class UploadFileForm(forms.Form):
|
|
title = forms.CharField(max_length=50)
|
|
file = forms.FileField()
|
|
|
|
A view handling this form will receive the file data in
|
|
:attr:`request.FILES <django.http.HttpRequest.FILES>`, which is a dictionary
|
|
containing a key for each :class:`~django.forms.FileField` (or
|
|
:class:`~django.forms.ImageField`, or other :class:`~django.forms.FileField`
|
|
subclass) in the form. So the data from the above form would
|
|
be accessible as ``request.FILES['file']``.
|
|
|
|
Note that :attr:`request.FILES <django.http.HttpRequest.FILES>` will only
|
|
contain data if the request method was ``POST`` and the ``<form>`` that posted
|
|
the request has the attribute ``enctype="multipart/form-data"``. Otherwise,
|
|
``request.FILES`` will be empty.
|
|
|
|
Most of the time, you'll simply pass the file data from ``request`` into the
|
|
form as described in :ref:`binding-uploaded-files`. This would look
|
|
something like::
|
|
|
|
from django.http import HttpResponseRedirect
|
|
from django.shortcuts import render_to_response
|
|
|
|
# Imaginary function to handle an uploaded file.
|
|
from somewhere import handle_uploaded_file
|
|
|
|
def upload_file(request):
|
|
if request.method == 'POST':
|
|
form = UploadFileForm(request.POST, request.FILES)
|
|
if form.is_valid():
|
|
handle_uploaded_file(request.FILES['file'])
|
|
return HttpResponseRedirect('/success/url/')
|
|
else:
|
|
form = UploadFileForm()
|
|
return render_to_response('upload.html', {'form': form})
|
|
|
|
Notice that we have to pass :attr:`request.FILES <django.http.HttpRequest.FILES>`
|
|
into the form's constructor; this is how file data gets bound into a form.
|
|
|
|
Handling uploaded files
|
|
-----------------------
|
|
|
|
.. class:: UploadedFile
|
|
|
|
The final piece of the puzzle is handling the actual file data from
|
|
:attr:`request.FILES <django.http.HttpRequest.FILES>`. Each entry in this
|
|
dictionary is an ``UploadedFile`` object -- a simple wrapper around an uploaded
|
|
file. You'll usually use one of these methods to access the uploaded content:
|
|
|
|
.. method:: read()
|
|
|
|
Read the entire uploaded data from the file. Be careful with this
|
|
method: if the uploaded file is huge it can overwhelm your system if you
|
|
try to read it into memory. You'll probably want to use ``chunks()``
|
|
instead; see below.
|
|
|
|
.. method:: multiple_chunks()
|
|
|
|
Returns ``True`` if the uploaded file is big enough to require
|
|
reading in multiple chunks. By default this will be any file
|
|
larger than 2.5 megabytes, but that's configurable; see below.
|
|
|
|
.. method:: chunks()
|
|
|
|
A generator returning chunks of the file. If ``multiple_chunks()`` is
|
|
``True``, you should use this method in a loop instead of ``read()``.
|
|
|
|
In practice, it's often easiest simply to use ``chunks()`` all the time;
|
|
see the example below.
|
|
|
|
.. attribute:: name
|
|
|
|
The name of the uploaded file (e.g. ``my_file.txt``).
|
|
|
|
.. attribute:: size
|
|
|
|
The size, in bytes, of the uploaded file.
|
|
|
|
There are a few other methods and attributes available on ``UploadedFile``
|
|
objects; see `UploadedFile objects`_ for a complete reference.
|
|
|
|
Putting it all together, here's a common way you might handle an uploaded file::
|
|
|
|
def handle_uploaded_file(f):
|
|
destination = open('some/file/name.txt', 'wb+')
|
|
for chunk in f.chunks():
|
|
destination.write(chunk)
|
|
destination.close()
|
|
|
|
Looping over ``UploadedFile.chunks()`` instead of using ``read()`` ensures that
|
|
large files don't overwhelm your system's memory.
|
|
|
|
Where uploaded data is stored
|
|
-----------------------------
|
|
|
|
Before you save uploaded files, the data needs to be stored somewhere.
|
|
|
|
By default, if an uploaded file is smaller than 2.5 megabytes, Django will hold
|
|
the entire contents of the upload in memory. This means that saving the file
|
|
involves only a read from memory and a write to disk and thus is very fast.
|
|
|
|
However, if an uploaded file is too large, Django will write the uploaded file
|
|
to a temporary file stored in your system's temporary directory. On a Unix-like
|
|
platform this means you can expect Django to generate a file called something
|
|
like ``/tmp/tmpzfp6I6.upload``. If an upload is large enough, you can watch this
|
|
file grow in size as Django streams the data onto disk.
|
|
|
|
These specifics -- 2.5 megabytes; ``/tmp``; etc. -- are simply "reasonable
|
|
defaults". Read on for details on how you can customize or completely replace
|
|
upload behavior.
|
|
|
|
Changing upload handler behavior
|
|
--------------------------------
|
|
|
|
Three settings control Django's file upload behavior:
|
|
|
|
:setting:`FILE_UPLOAD_MAX_MEMORY_SIZE`
|
|
The maximum size, in bytes, for files that will be uploaded into memory.
|
|
Files larger than :setting:`FILE_UPLOAD_MAX_MEMORY_SIZE` will be
|
|
streamed to disk.
|
|
|
|
Defaults to 2.5 megabytes.
|
|
|
|
:setting:`FILE_UPLOAD_TEMP_DIR`
|
|
The directory where uploaded files larger than
|
|
:setting:`FILE_UPLOAD_MAX_MEMORY_SIZE` will be stored.
|
|
|
|
Defaults to your system's standard temporary directory (i.e. ``/tmp`` on
|
|
most Unix-like systems).
|
|
|
|
:setting:`FILE_UPLOAD_PERMISSIONS`
|
|
The numeric mode (i.e. ``0644``) to set newly uploaded files to. For
|
|
more information about what these modes mean, see the documentation for
|
|
:func:`os.chmod`.
|
|
|
|
If this isn't given or is ``None``, you'll get operating-system
|
|
dependent behavior. On most platforms, temporary files will have a mode
|
|
of ``0600``, and files saved from memory will be saved using the
|
|
system's standard umask.
|
|
|
|
.. warning::
|
|
|
|
If you're not familiar with file modes, please note that the leading
|
|
``0`` is very important: it indicates an octal number, which is the
|
|
way that modes must be specified. If you try to use ``644``, you'll
|
|
get totally incorrect behavior.
|
|
|
|
**Always prefix the mode with a 0.**
|
|
|
|
:setting:`FILE_UPLOAD_HANDLERS`
|
|
The actual handlers for uploaded files. Changing this setting allows
|
|
complete customization -- even replacement -- of Django's upload
|
|
process. See `upload handlers`_, below, for details.
|
|
|
|
Defaults to::
|
|
|
|
("django.core.files.uploadhandler.MemoryFileUploadHandler",
|
|
"django.core.files.uploadhandler.TemporaryFileUploadHandler",)
|
|
|
|
Which means "try to upload to memory first, then fall back to temporary
|
|
files."
|
|
|
|
``UploadedFile`` objects
|
|
========================
|
|
|
|
In addition to those inherited from :class:`File`, all ``UploadedFile`` objects
|
|
define the following methods/attributes:
|
|
|
|
.. attribute:: UploadedFile.content_type
|
|
|
|
The content-type header uploaded with the file (e.g. :mimetype:`text/plain`
|
|
or :mimetype:`application/pdf`). Like any data supplied by the user, you
|
|
shouldn't trust that the uploaded file is actually this type. You'll still
|
|
need to validate that the file contains the content that the content-type
|
|
header claims -- "trust but verify."
|
|
|
|
.. attribute:: UploadedFile.charset
|
|
|
|
For :mimetype:`text/*` content-types, the character set (i.e. ``utf8``)
|
|
supplied by the browser. Again, "trust but verify" is the best policy here.
|
|
|
|
.. attribute:: UploadedFile.temporary_file_path()
|
|
|
|
Only files uploaded onto disk will have this method; it returns the full
|
|
path to the temporary uploaded file.
|
|
|
|
.. note::
|
|
|
|
Like regular Python files, you can read the file line-by-line simply by
|
|
iterating over the uploaded file:
|
|
|
|
.. code-block:: python
|
|
|
|
for line in uploadedfile:
|
|
do_something_with(line)
|
|
|
|
However, *unlike* standard Python files, :class:`UploadedFile` only
|
|
understands ``\n`` (also known as "Unix-style") line endings. If you know
|
|
that you need to handle uploaded files with different line endings, you'll
|
|
need to do so in your view.
|
|
|
|
Upload Handlers
|
|
===============
|
|
|
|
When a user uploads a file, Django passes off the file data to an *upload
|
|
handler* -- a small class that handles file data as it gets uploaded. Upload
|
|
handlers are initially defined in the :setting:`FILE_UPLOAD_HANDLERS` setting,
|
|
which defaults to::
|
|
|
|
("django.core.files.uploadhandler.MemoryFileUploadHandler",
|
|
"django.core.files.uploadhandler.TemporaryFileUploadHandler",)
|
|
|
|
Together the ``MemoryFileUploadHandler`` and ``TemporaryFileUploadHandler``
|
|
provide Django's default file upload behavior of reading small files into memory
|
|
and large ones onto disk.
|
|
|
|
You can write custom handlers that customize how Django handles files. You
|
|
could, for example, use custom handlers to enforce user-level quotas, compress
|
|
data on the fly, render progress bars, and even send data to another storage
|
|
location directly without storing it locally.
|
|
|
|
.. _modifying_upload_handlers_on_the_fly:
|
|
|
|
Modifying upload handlers on the fly
|
|
------------------------------------
|
|
|
|
Sometimes particular views require different upload behavior. In these cases,
|
|
you can override upload handlers on a per-request basis by modifying
|
|
``request.upload_handlers``. By default, this list will contain the upload
|
|
handlers given by :setting:`FILE_UPLOAD_HANDLERS`, but you can modify the list
|
|
as you would any other list.
|
|
|
|
For instance, suppose you've written a ``ProgressBarUploadHandler`` that
|
|
provides feedback on upload progress to some sort of AJAX widget. You'd add this
|
|
handler to your upload handlers like this::
|
|
|
|
request.upload_handlers.insert(0, ProgressBarUploadHandler())
|
|
|
|
You'd probably want to use ``list.insert()`` in this case (instead of
|
|
``append()``) because a progress bar handler would need to run *before* any
|
|
other handlers. Remember, the upload handlers are processed in order.
|
|
|
|
If you want to replace the upload handlers completely, you can just assign a new
|
|
list::
|
|
|
|
request.upload_handlers = [ProgressBarUploadHandler()]
|
|
|
|
.. note::
|
|
|
|
You can only modify upload handlers *before* accessing
|
|
``request.POST`` or ``request.FILES`` -- it doesn't make sense to
|
|
change upload handlers after upload handling has already
|
|
started. If you try to modify ``request.upload_handlers`` after
|
|
reading from ``request.POST`` or ``request.FILES`` Django will
|
|
throw an error.
|
|
|
|
Thus, you should always modify uploading handlers as early in your view as
|
|
possible.
|
|
|
|
Also, ``request.POST`` is accessed by
|
|
:class:`~django.middleware.csrf.CsrfViewMiddleware` which is enabled by
|
|
default. This means you will need to use
|
|
:func:`~django.views.decorators.csrf.csrf_exempt` on your view to allow you
|
|
to change the upload handlers. You will then need to use
|
|
:func:`~django.views.decorators.csrf.csrf_protect` on the function that
|
|
actually processes the request. Note that this means that the handlers may
|
|
start receiving the file upload before the CSRF checks have been done.
|
|
Example code:
|
|
|
|
.. code-block:: python
|
|
|
|
from django.views.decorators.csrf import csrf_exempt, csrf_protect
|
|
|
|
@csrf_exempt
|
|
def upload_file_view(request):
|
|
request.upload_handlers.insert(0, ProgressBarUploadHandler())
|
|
return _upload_file_view(request)
|
|
|
|
@csrf_protect
|
|
def _upload_file_view(request):
|
|
... # Process request
|
|
|
|
|
|
Writing custom upload handlers
|
|
------------------------------
|
|
|
|
All file upload handlers should be subclasses of
|
|
``django.core.files.uploadhandler.FileUploadHandler``. You can define upload
|
|
handlers wherever you wish.
|
|
|
|
Required methods
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Custom file upload handlers **must** define the following methods:
|
|
|
|
``FileUploadHandler.receive_data_chunk(self, raw_data, start)``
|
|
Receives a "chunk" of data from the file upload.
|
|
|
|
``raw_data`` is a byte string containing the uploaded data.
|
|
|
|
``start`` is the position in the file where this ``raw_data`` chunk
|
|
begins.
|
|
|
|
The data you return will get fed into the subsequent upload handlers'
|
|
``receive_data_chunk`` methods. In this way, one handler can be a
|
|
"filter" for other handlers.
|
|
|
|
Return ``None`` from ``receive_data_chunk`` to sort-circuit remaining
|
|
upload handlers from getting this chunk.. This is useful if you're
|
|
storing the uploaded data yourself and don't want future handlers to
|
|
store a copy of the data.
|
|
|
|
If you raise a ``StopUpload`` or a ``SkipFile`` exception, the upload
|
|
will abort or the file will be completely skipped.
|
|
|
|
``FileUploadHandler.file_complete(self, file_size)``
|
|
Called when a file has finished uploading.
|
|
|
|
The handler should return an ``UploadedFile`` object that will be stored
|
|
in ``request.FILES``. Handlers may also return ``None`` to indicate that
|
|
the ``UploadedFile`` object should come from subsequent upload handlers.
|
|
|
|
Optional methods
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Custom upload handlers may also define any of the following optional methods or
|
|
attributes:
|
|
|
|
``FileUploadHandler.chunk_size``
|
|
Size, in bytes, of the "chunks" Django should store into memory and feed
|
|
into the handler. That is, this attribute controls the size of chunks
|
|
fed into ``FileUploadHandler.receive_data_chunk``.
|
|
|
|
For maximum performance the chunk sizes should be divisible by ``4`` and
|
|
should not exceed 2 GB (2\ :sup:`31` bytes) in size. When there are
|
|
multiple chunk sizes provided by multiple handlers, Django will use the
|
|
smallest chunk size defined by any handler.
|
|
|
|
The default is 64*2\ :sup:`10` bytes, or 64 KB.
|
|
|
|
``FileUploadHandler.new_file(self, field_name, file_name, content_type, content_length, charset)``
|
|
Callback signaling that a new file upload is starting. This is called
|
|
before any data has been fed to any upload handlers.
|
|
|
|
``field_name`` is a string name of the file ``<input>`` field.
|
|
|
|
``file_name`` is the unicode filename that was provided by the browser.
|
|
|
|
``content_type`` is the MIME type provided by the browser -- E.g.
|
|
``'image/jpeg'``.
|
|
|
|
``content_length`` is the length of the image given by the browser.
|
|
Sometimes this won't be provided and will be ``None``.
|
|
|
|
``charset`` is the character set (i.e. ``utf8``) given by the browser.
|
|
Like ``content_length``, this sometimes won't be provided.
|
|
|
|
This method may raise a ``StopFutureHandlers`` exception to prevent
|
|
future handlers from handling this file.
|
|
|
|
``FileUploadHandler.upload_complete(self)``
|
|
Callback signaling that the entire upload (all files) has completed.
|
|
|
|
``FileUploadHandler.handle_raw_input(self, input_data, META, content_length, boundary, encoding)``
|
|
Allows the handler to completely override the parsing of the raw
|
|
HTTP input.
|
|
|
|
``input_data`` is a file-like object that supports ``read()``-ing.
|
|
|
|
``META`` is the same object as ``request.META``.
|
|
|
|
``content_length`` is the length of the data in ``input_data``. Don't
|
|
read more than ``content_length`` bytes from ``input_data``.
|
|
|
|
``boundary`` is the MIME boundary for this request.
|
|
|
|
``encoding`` is the encoding of the request.
|
|
|
|
Return ``None`` if you want upload handling to continue, or a tuple of
|
|
``(POST, FILES)`` if you want to return the new data structures suitable
|
|
for the request directly.
|