mirror of https://github.com/django/django.git
344 lines
13 KiB
Plaintext
344 lines
13 KiB
Plaintext
================
|
|
Full text search
|
|
================
|
|
|
|
The database functions in the ``django.contrib.postgres.search`` module ease
|
|
the use of PostgreSQL's `full text search engine
|
|
<https://www.postgresql.org/docs/current/textsearch.html>`_.
|
|
|
|
For the examples in this document, we'll use the models defined in
|
|
:doc:`/topics/db/queries`.
|
|
|
|
.. seealso::
|
|
|
|
For a high-level overview of searching, see the :doc:`topic documentation
|
|
</topics/db/search>`.
|
|
|
|
.. currentmodule:: django.contrib.postgres.search
|
|
|
|
The ``search`` lookup
|
|
=====================
|
|
|
|
.. fieldlookup:: search
|
|
|
|
A common way to use full text search is to search a single term against a
|
|
single column in the database. For example::
|
|
|
|
>>> Entry.objects.filter(body_text__search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
This creates a ``to_tsvector`` in the database from the ``body_text`` field
|
|
and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
|
|
default database search configuration. The results are obtained by matching the
|
|
query and the vector.
|
|
|
|
To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
|
|
:setting:`INSTALLED_APPS`.
|
|
|
|
.. versionchanged:: 3.1
|
|
|
|
Support for query expressions was added.
|
|
|
|
``SearchVector``
|
|
================
|
|
|
|
.. class:: SearchVector(*expressions, config=None, weight=None)
|
|
|
|
Searching against a single field is great but rather limiting. The ``Entry``
|
|
instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
|
|
To query against both fields, use a ``SearchVector``::
|
|
|
|
>>> from django.contrib.postgres.search import SearchVector
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', 'blog__tagline'),
|
|
... ).filter(search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
The arguments to ``SearchVector`` can be any
|
|
:class:`~django.db.models.Expression` or the name of a field. Multiple
|
|
arguments will be concatenated together using a space so that the search
|
|
document includes them all.
|
|
|
|
``SearchVector`` objects can be combined together, allowing you to reuse them.
|
|
For example::
|
|
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text') + SearchVector('blog__tagline'),
|
|
... ).filter(search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
See :ref:`postgresql-fts-search-configuration` and
|
|
:ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
|
|
and ``weight`` parameters.
|
|
|
|
``SearchQuery``
|
|
===============
|
|
|
|
.. class:: SearchQuery(value, config=None, search_type='plain')
|
|
|
|
``SearchQuery`` translates the terms the user provides into a search query
|
|
object that the database compares to a search vector. By default, all the words
|
|
the user provides are passed through the stemming algorithms, and then it
|
|
looks for matches for all of the resulting terms.
|
|
|
|
If ``search_type`` is ``'plain'``, which is the default, the terms are treated
|
|
as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
|
|
as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
|
|
formatted search query with terms and operators. If ``search_type`` is
|
|
``'websearch'``, then you can provide a formatted search query, similar to the
|
|
one used by web search engines. ``'websearch'`` requires PostgreSQL ≥ 11. Read
|
|
PostgreSQL's `Full Text Search docs`_ to learn about differences and syntax.
|
|
Examples:
|
|
|
|
.. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery
|
|
>>> SearchQuery('red tomato') # two keywords
|
|
>>> SearchQuery('tomato red') # same results as above
|
|
>>> SearchQuery('red tomato', search_type='phrase') # a phrase
|
|
>>> SearchQuery('tomato red', search_type='phrase') # a different phrase
|
|
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
|
|
>>> SearchQuery("'tomato' ('red' OR 'green')", search_type='websearch') # websearch operators
|
|
|
|
``SearchQuery`` terms can be combined logically to provide more flexibility::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery
|
|
>>> SearchQuery('meat') & SearchQuery('cheese') # AND
|
|
>>> SearchQuery('meat') | SearchQuery('cheese') # OR
|
|
>>> ~SearchQuery('meat') # NOT
|
|
|
|
See :ref:`postgresql-fts-search-configuration` for an explanation of the
|
|
``config`` parameter.
|
|
|
|
.. versionchanged:: 3.1
|
|
|
|
Support for ``'websearch'`` search type and query expressions in
|
|
``SearchQuery.value`` were added.
|
|
|
|
``SearchRank``
|
|
==============
|
|
|
|
.. class:: SearchRank(vector, query, weights=None, normalization=None, cover_density=False)
|
|
|
|
So far, we've returned the results for which any match between the vector and
|
|
the query are possible. It's likely you may wish to order the results by some
|
|
sort of relevancy. PostgreSQL provides a ranking function which takes into
|
|
account how often the query terms appear in the document, how close together
|
|
the terms are in the document, and how important the part of the document is
|
|
where they occur. The better the match, the higher the value of the rank. To
|
|
order by relevancy::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
|
>>> vector = SearchVector('body_text')
|
|
>>> query = SearchQuery('cheese')
|
|
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
|
|
|
See :ref:`postgresql-fts-weighting-queries` for an explanation of the
|
|
``weights`` parameter.
|
|
|
|
Set the ``cover_density`` parameter to ``True`` to enable the cover density
|
|
ranking, which means that the proximity of matching query terms is taken into
|
|
account.
|
|
|
|
Provide an integer to the ``normalization`` parameter to control rank
|
|
normalization. This integer is a bit mask, so you can combine multiple
|
|
behaviors::
|
|
|
|
>>> from django.db.models import Value
|
|
>>> Entry.objects.annotate(
|
|
... rank=SearchRank(
|
|
... vector,
|
|
... query,
|
|
... normalization=Value(2).bitor(Value(4)),
|
|
... )
|
|
... )
|
|
|
|
The PostgreSQL documentation has more details about `different rank
|
|
normalization options`_.
|
|
|
|
.. _different rank normalization options: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
|
|
|
|
.. versionadded:: 3.1
|
|
|
|
The ``normalization`` and ``cover_density`` parameters were added.
|
|
|
|
``SearchHeadline``
|
|
==================
|
|
|
|
.. versionadded:: 3.1
|
|
|
|
.. class:: SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
|
|
|
|
Accepts a single text field or an expression, a query, a config, and a set of
|
|
options. Returns highlighted search results.
|
|
|
|
Set the ``start_sel`` and ``stop_sel`` parameters to the string values to be
|
|
used to wrap highlighted query terms in the document. PostgreSQL's defaults are
|
|
``<b>`` and ``</b>``.
|
|
|
|
Provide integer values to the ``max_words`` and ``min_words`` parameters to
|
|
determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
|
|
15.
|
|
|
|
Provide an integer value to the ``short_word`` parameter to discard words of
|
|
this length or less in each headline. PostgreSQL's default is 3.
|
|
|
|
Set the ``highlight_all`` parameter to ``True`` to use the whole document in
|
|
place of a fragment and ignore ``max_words``, ``min_words``, and ``short_word``
|
|
parameters. That's disabled by default in PostgreSQL.
|
|
|
|
Provide a non-zero integer value to the ``max_fragments`` to set the maximum
|
|
number of fragments to display. That's disabled by default in PostgreSQL.
|
|
|
|
Set the ``fragment_delimiter`` string parameter to configure the delimiter
|
|
between fragments. PostgreSQL's default is ``" ... "``.
|
|
|
|
The PostgreSQL documentation has more details on `highlighting search
|
|
results`_.
|
|
|
|
Usage example::
|
|
|
|
>>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
|
|
>>> query = SearchQuery('red tomato')
|
|
>>> entry = Entry.objects.annotate(
|
|
... headline=SearchHeadline(
|
|
... 'body_text',
|
|
... query,
|
|
... start_sel='<span>',
|
|
... stop_sel='</span>',
|
|
... ),
|
|
... ).get()
|
|
>>> print(entry.headline)
|
|
Sandwich with <span>tomato</span> and <span>red</span> cheese.
|
|
|
|
See :ref:`postgresql-fts-search-configuration` for an explanation of the
|
|
``config`` parameter.
|
|
|
|
.. _highlighting search results: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
|
|
|
|
.. _postgresql-fts-search-configuration:
|
|
|
|
Changing the search configuration
|
|
=================================
|
|
|
|
You can specify the ``config`` attribute to a :class:`SearchVector` and
|
|
:class:`SearchQuery` to use a different search configuration. This allows using
|
|
different language parsers and dictionaries as defined by the database::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchVector
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', config='french'),
|
|
... ).filter(search=SearchQuery('œuf', config='french'))
|
|
[<Entry: Pain perdu>]
|
|
|
|
The value of ``config`` could also be stored in another column::
|
|
|
|
>>> from django.db.models import F
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', config=F('blog__language')),
|
|
... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
|
|
[<Entry: Pain perdu>]
|
|
|
|
.. _postgresql-fts-weighting-queries:
|
|
|
|
Weighting queries
|
|
=================
|
|
|
|
Every field may not have the same relevance in a query, so you can set weights
|
|
of various vectors before you combine them::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
|
>>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
|
|
>>> query = SearchQuery('cheese')
|
|
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
|
|
|
|
The weight should be one of the following letters: D, C, B, A. By default,
|
|
these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
|
|
respectively. If you wish to weight them differently, pass a list of four
|
|
floats to :class:`SearchRank` as ``weights`` in the same order above::
|
|
|
|
>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
|
|
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
|
|
|
|
Performance
|
|
===========
|
|
|
|
Special database configuration isn't necessary to use any of these functions,
|
|
however, if you're searching more than a few hundred records, you're likely to
|
|
run into performance problems. Full text search is a more intensive process
|
|
than comparing the size of an integer, for example.
|
|
|
|
In the event that all the fields you're querying on are contained within one
|
|
particular model, you can create a functional index which matches the search
|
|
vector you wish to use. The PostgreSQL documentation has details on
|
|
`creating indexes for full text search
|
|
<https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
|
|
|
|
``SearchVectorField``
|
|
---------------------
|
|
|
|
.. class:: SearchVectorField
|
|
|
|
If this approach becomes too slow, you can add a ``SearchVectorField`` to your
|
|
model. You'll need to keep it populated with triggers, for example, as
|
|
described in the `PostgreSQL documentation`_. You can then query the field as
|
|
if it were an annotated ``SearchVector``::
|
|
|
|
>>> Entry.objects.update(search_vector=SearchVector('body_text'))
|
|
>>> Entry.objects.filter(search_vector='cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
|
|
|
.. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
|
|
|
|
Trigram similarity
|
|
==================
|
|
|
|
Another approach to searching is trigram similarity. A trigram is a group of
|
|
three consecutive characters. In addition to the :lookup:`trigram_similar`
|
|
lookup, you can use a couple of other expressions.
|
|
|
|
To use them, you need to activate the `pg_trgm extension
|
|
<https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
|
|
install it using the
|
|
:class:`~django.contrib.postgres.operations.TrigramExtension` migration
|
|
operation.
|
|
|
|
``TrigramSimilarity``
|
|
---------------------
|
|
|
|
.. class:: TrigramSimilarity(expression, string, **extra)
|
|
|
|
Accepts a field name or expression, and a string or expression. Returns the
|
|
trigram similarity between the two arguments.
|
|
|
|
Usage example::
|
|
|
|
>>> from django.contrib.postgres.search import TrigramSimilarity
|
|
>>> Author.objects.create(name='Katy Stevens')
|
|
>>> Author.objects.create(name='Stephen Keats')
|
|
>>> test = 'Katie Stephens'
|
|
>>> Author.objects.annotate(
|
|
... similarity=TrigramSimilarity('name', test),
|
|
... ).filter(similarity__gt=0.3).order_by('-similarity')
|
|
[<Author: Katy Stevens>, <Author: Stephen Keats>]
|
|
|
|
``TrigramDistance``
|
|
-------------------
|
|
|
|
.. class:: TrigramDistance(expression, string, **extra)
|
|
|
|
Accepts a field name or expression, and a string or expression. Returns the
|
|
trigram distance between the two arguments.
|
|
|
|
Usage example::
|
|
|
|
>>> from django.contrib.postgres.search import TrigramDistance
|
|
>>> Author.objects.create(name='Katy Stevens')
|
|
>>> Author.objects.create(name='Stephen Keats')
|
|
>>> test = 'Katie Stephens'
|
|
>>> Author.objects.annotate(
|
|
... distance=TrigramDistance('name', test),
|
|
... ).filter(distance__lte=0.7).order_by('distance')
|
|
[<Author: Katy Stevens>, <Author: Stephen Keats>]
|