1668842220
Unlike relational databases, full-text search is not standardized. There are several open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it's complicated to set up and maintain. Further, if you're not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.
For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.
In this tutorial, you'll learnUnlike relational databases, full-text search is not standardized. There are several open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it's complicated to set up and maintain. Further, if you're not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.
For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.
In this tutorial, you'll learn how to add basic and full-text search to a Django app with Postgres. You'll also optimize the full-text search by adding a search vector field and a database index.
This is an intermediate-level tutorial. It assumes that you're familiar with both Django and Docker. Review the Dockerizing Django with Postgres, Gunicorn, and Nginx tutorial for more info.
By the end of this tutorial, you will be able to:
Clone down the base branch from the django-search repo:
$ git clone https://github.com/testdrivenio/django-search --branch base --single-branch
$ cd django-search
You'll use Docker to simplify setting up and running Postgres along with Django.
From the project root, create the images and spin up the Docker containers:
$ docker-compose up -d --build
Next, apply the migrations and create a superuser:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
$ docker-compose exec web python manage.py createsuperuser
Once done, navigate to http://127.0.0.1:8011/quotes/ to ensure the app works as expected. You should see the following:
Want to learn how to work with Django and Postgres? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.
Take note of the Quote
model in quotes/models.py:
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
def __str__(self):
return self.quote
Next, run the following management command to add 10,000 quotes to the database:
$ docker-compose exec web python manage.py add_quotes
This will take a couple of minutes. Once done, navigate to http://127.0.0.1:8011/quotes/ to see the data.
The output of the view is cached for five minutes, so you may want to comment out the
@method_decorator
in quotes/views.py to load the quotes. Make sure to remove the comment once done.
In the quotes/templates/quote.html file, you have a basic form with a search input field:
<form action="{% url 'search_results' %}" method="get">
<input
type="search"
name="q"
placeholder="Search by name or quote..."
class="form-control"
/>
</form>
On submit, the form sends the data to the backend. A GET
request is used rather than a POST
so that way we have access to the query string both in the URL and in the Django view, allowing users to share search results as links.
Before proceeding further, take a quick look at the project structure and the rest of the code.
When it comes to search, with Django, you'll typically start by performing search queries with contains
or icontains
for exact matches. The Q object can be used as well to add AND (&
) or OR (|
) logical operators.
For instance, using the OR operator, override theSearchResultsList
's default QuerySet
in quotes/views.py like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(
Q(name__icontains=query) | Q(quote__icontains=query)
)
Here, we used the filter method to filter against the name
or quote
fields. Furthermore, we also used the icontains extension to check if the query is present in the name
or quote
fields (case insensitive). A positive result will be returned if a match is found.
Don't forget the import:
from django.db.models import Q
Try it out:
For small data sets, this is a great way to add basic search functionality to your app. If you're dealing with a large data set or want search functionality that feels like an Internet search engine, you'll want to move to full-text search.
The basic search that we saw earlier has several limitations especially when you want to perform complex lookups.
As mentioned, with basic search, you can only perform exact matches.
Another limitation is that of stop words. Stop words are words such as "a", "an", and "the". These words are common and insufficiently meaningful, therefore they should be ignored. To test, try searching for a word with "the" in front of it. Say you searched for "the middle". In this case, you'll only see results for "the middle", so you won't see any results that have the word "middle" without "the" before it.
Say you have these two sentences:
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"the middle" | 1 | 1 and 2 |
"middle" | 1 and 2 | 1 and 2 |
Another issue is that of ignoring similar words. With basic search, only exact matches are returned. However, with full-text search, similar words are accounted for. To test, try to find some similar words like "pony" and "ponies". With basic search, if you search for "pony" you won't see results that contain "ponies" -- and vice versa.
Say you have these two sentences.
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"pony" | 1 | 1 and 2 |
"ponies" | 2 | 1 and 2 |
With full-text search, both of these issues are mitigated. However, keep in mind that depending on your goal, full-text search may actually decrease precision (quality) and recall (quantity of relevant results). Typically, full-text search is less precise than basic search, since basic search yields exact matches. That said, if you're searching through large data sets with large blocks of text, full-text search is preferred since it's usually much faster.
Full-text search is an advanced searching technique that examines all the words in every stored document as it tries to match the search criteria. In addition, with full-text search, you can employ language-specific stemming on the words being indexed. For instance, the word "drives", "drove", and "driven" will be recorded under the single concept word "drive". Stemming is the process of reducing words to their word stem, base, or root form.
It suffices to say that full-text search is not perfect. It's likely to retrieve many documents that are not relevant (false positives) to the intended search query. However, there are some techniques based on Bayesian algorithms that can help reduce such problems.
To take advantage of Postgres full-text search with Django, add django.contrib.postgres
to your INSTALLED_APPS
list:
INSTALLED_APPS = [
...
"django.contrib.postgres", # new
]
Next, let's look at two quick examples of full-text search, on a single field and on multiple fields.
Update the get_queryset
function under the SearchResultsList
view function like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(quote__search=query)
Here, we set up full-text search against a single field -- the quote field.
As you can see, it takes similar words into account. In the above example, "ponies" and "pony" are treated as similar words.
To search against multiple fields and on related models, you can use the SearchVector
class.
Again, update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.annotate(search=SearchVector("name", "quote")).filter(
search=query
)
To search against multiple fields, you annotated the queryset using a SearchVector
. The vector is the data that you're searching for, which has been converted into a form that is easy to search. In the example above, this data is the name
and quote
fields in your database.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector
Try some searches out.
In this section, you'll combine several methods such as SearchVector, SearchQuery, and SearchRank to produce a very robust search that uses both stemming and ranking.
Again, stemming is the process of reducing words to their word stem, base, or root form. With stemming, words like "child" and "children" will be treated as similar words. Ranking, on the other hand, allows us to order results by relevancy.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
return (
Quote.objects.annotate(
search=search_vector, rank=SearchRank(search_vector, search_query)
)
.filter(search=search_query)
.order_by("-rank")
)
What's happening here?
SearchVector
- again you used a search vector to search against multiple fields. The data is converted into another form since you're no longer just searching the raw text like you did when icontains
was used. Therefore, with this, you will be able to search plurals easily. For example, searching for "flask" and "flasks" will yield the same search because they are, well, basically the same thing.SearchQuery
- translates the words provided to us as a query from the form, passes them through a stemming algorithm, and then it looks for matches for all of the resulting terms.SearchRank
- allows us to order the results by relevancy. It takes into account how often the query terms appear in the document, how close the terms are on the document, and how important the part of the document is where they occur.Add the imports:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
Compare the results from the basic search to that of the full-text search. There's a clear difference. In the full-text search, the query with the highest results is shown first. This is the power of SearchRank
. Combining SearchVector
, SearchQuery
, and SearchRank
is a quick way to produce a much more powerful and precise search than the basic search.
Full-text search gives us the ability to add more importance to some fields in our table in the database over other fields. We can achieve this by adding weights to our queries.
The weight should be one of the following letters D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", weight="B") + SearchVector(
"quote", weight="A"
)
search_query = SearchQuery(query)
return (
Quote.objects.annotate(rank=SearchRank(search_vector, search_query))
.filter(rank__gte=0.3)
.order_by("-rank")
)
Here, you added weights to the SearchVector
using both the name
and quote
fields. Weights of 0.4 and 1.0 were applied to the name and quote fields, respectively. Therefore, quote matches will prevail over name content matches. Finally, you filtered the results to display only the ones that are greater than 0.3.
In this section, you'll add a little preview of your search result via the SearchHeadline method. This will highlight the search result query.
Update SearchResultsList
again:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
search_headline = SearchHeadline("quote", search_query)
return Quote.objects.annotate(
search=search_vector,
rank=SearchRank(search_vector, search_query)
).annotate(headline=search_headline).filter(search=search_query).order_by("-rank")
The SearchHeadline
takes in the field you want to preview. In this case, this will be the quote
field along with the query, which will be in bold.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank, SearchHeadline
Before trying out some searches, update the <li></li>
in quotes/templates/search.html like so:
<li>{{ quote.headline | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Now, instead of showing the quotes as you did before, only a preview of the full quote field is displayed along with the highlighted search query.
Full-text search is an intensive process. To combat slow performance, you can:
quote
or name
fields.To learn more about performance with full-text search, review the Performance section from the Django docs.
Start by adding a new SearchVectorField field to the Quote
model in quotes/models.py:
from django.contrib.postgres.search import SearchVectorField # new
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True) # new
def __str__(self):
return self.quote
Create the migration file:
$ docker-compose exec web python manage.py makemigrations
Now, you can only populate this field when the quote
or name
objects already exists in the database. Thus, we need to add a trigger to update the search_vector
field whenever the quote
or name
fields are updated. To achieve this, create a custom migration file in "quotes/migrations" called 0003_search_vector_trigger.py:
from django.contrib.postgres.search import SearchVector
from django.db import migrations
def compute_search_vector(apps, schema_editor):
Quote = apps.get_model("quotes", "Quote")
Quote.objects.update(search_vector=SearchVector("name", "quote"))
class Migration(migrations.Migration):
dependencies = [
("quotes", "0002_quote_search_vector"),
]
operations = [
migrations.RunSQL(
sql="""
CREATE TRIGGER search_vector_trigger
BEFORE INSERT OR UPDATE OF name, quote, search_vector
ON quotes_quote
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, quote
);
UPDATE quotes_quote SET search_vector = NULL;
""",
reverse_sql="""
DROP TRIGGER IF EXISTS search_vector_trigger
ON quotes_quote;
""",
),
migrations.RunPython(
compute_search_vector, reverse_code=migrations.RunPython.noop
),
]
Depending on your project structure, you may need to update the name of the previous migration file in
dependencies
.
Apply the migrations:
$ docker-compose exec web python manage.py migrate
To use the new field for searches, update SearchResultsList
like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(search_vector=query)
Update the <li></li>
in quotes/templates/search.html again:
<li>{{ quote.quote | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Finally, let's set up a functional index, GinIndex.
Update the Quote
model:
from django.contrib.postgres.indexes import GinIndex # new
from django.contrib.postgres.search import SearchVectorField
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True)
def __str__(self):
return self.quote
# new
class Meta:
indexes = [
GinIndex(fields=["search_vector"]),
]
Create and apply the migrations one last time:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
Test it out.
In this tutorial, you were guided through adding basic and full-text search to a Django application. We also took a look at how to optimize the full-text search functionality by adding a search vector field and a database index.
Grab the complete code from the django-search repo.
This is an intermediate-level tutorial. It assumes that you're familiar with both Django and Docker. Review the Dockerizing Django with Postgres, Gunicorn, and Nginx tutorial for more info.
By the end of this tutorial, you will be able to:
Clone down the base branch from the django-search repo:
$ git clone https://github.com/testdrivenio/django-search --branch base --single-branch
$ cd django-search
You'll use Docker to simplify setting up and running Postgres along with Django.
From the project root, create the images and spin up the Docker containers:
$ docker-compose up -d --build
Next, apply the migrations and create a superuser:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
$ docker-compose exec web python manage.py createsuperuser
Once done, navigate to http://127.0.0.1:8011/quotes/ to ensure the app works as expected. You should see the following:
Want to learn how to work with Django and Postgres? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.
Take note of the Quote
model in quotes/models.py:
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
def __str__(self):
return self.quote
Next, run the following management command to add 10,000 quotes to the database:
$ docker-compose exec web python manage.py add_quotes
This will take a couple of minutes. Once done, navigate to http://127.0.0.1:8011/quotes/ to see the data.
The output of the view is cached for five minutes, so you may want to comment out the
@method_decorator
in quotes/views.py to load the quotes. Make sure to remove the comment once done.
In the quotes/templates/quote.html file, you have a basic form with a search input field:
<form action="{% url 'search_results' %}" method="get">
<input
type="search"
name="q"
placeholder="Search by name or quote..."
class="form-control"
/>
</form>
On submit, the form sends the data to the backend. A GET
request is used rather than a POST
so that way we have access to the query string both in the URL and in the Django view, allowing users to share search results as links.
Before proceeding further, take a quick look at the project structure and the rest of the code.
When it comes to search, with Django, you'll typically start by performing search queries with contains
or icontains
for exact matches. The Q object can be used as well to add AND (&
) or OR (|
) logical operators.
For instance, using the OR operator, override theSearchResultsList
's default QuerySet
in quotes/views.py like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(
Q(name__icontains=query) | Q(quote__icontains=query)
)
Here, we used the filter method to filter against the name
or quote
fields. Furthermore, we also used the icontains extension to check if the query is present in the name
or quote
fields (case insensitive). A positive result will be returned if a match is found.
Don't forget the import:
from django.db.models import Q
Try it out:
For small data sets, this is a great way to add basic search functionality to your app. If you're dealing with a large data set or want search functionality that feels like an Internet search engine, you'll want to move to full-text search.
The basic search that we saw earlier has several limitations especially when you want to perform complex lookups.
As mentioned, with basic search, you can only perform exact matches.
Another limitation is that of stop words. Stop words are words such as "a", "an", and "the". These words are common and insufficiently meaningful, therefore they should be ignored. To test, try searching for a word with "the" in front of it. Say you searched for "the middle". In this case, you'll only see results for "the middle", so you won't see any results that have the word "middle" without "the" before it.
Say you have these two sentences:
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"the middle" | 1 | 1 and 2 |
"middle" | 1 and 2 | 1 and 2 |
Another issue is that of ignoring similar words. With basic search, only exact matches are returned. However, with full-text search, similar words are accounted for. To test, try to find some similar words like "pony" and "ponies". With basic search, if you search for "pony" you won't see results that contain "ponies" -- and vice versa.
Say you have these two sentences.
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"pony" | 1 | 1 and 2 |
"ponies" | 2 | 1 and 2 |
With full-text search, both of these issues are mitigated. However, keep in mind that depending on your goal, full-text search may actually decrease precision (quality) and recall (quantity of relevant results). Typically, full-text search is less precise than basic search, since basic search yields exact matches. That said, if you're searching through large data sets with large blocks of text, full-text search is preferred since it's usually much faster.
Full-text search is an advanced searching technique that examines all the words in every stored document as it tries to match the search criteria. In addition, with full-text search, you can employ language-specific stemming on the words being indexed. For instance, the word "drives", "drove", and "driven" will be recorded under the single concept word "drive". Stemming is the process of reducing words to their word stem, base, or root form.
It suffices to say that full-text search is not perfect. It's likely to retrieve many documents that are not relevant (false positives) to the intended search query. However, there are some techniques based on Bayesian algorithms that can help reduce such problems.
To take advantage of Postgres full-text search with Django, add django.contrib.postgres
to your INSTALLED_APPS
list:
INSTALLED_APPS = [
...
"django.contrib.postgres", # new
]
Next, let's look at two quick examples of full-text search, on a single field and on multiple fields.
Update the get_queryset
function under the SearchResultsList
view function like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(quote__search=query)
Here, we set up full-text search against a single field -- the quote field.
As you can see, it takes similar words into account. In the above example, "ponies" and "pony" are treated as similar words.
To search against multiple fields and on related models, you can use the SearchVector
class.
Again, update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.annotate(search=SearchVector("name", "quote")).filter(
search=query
)
To search against multiple fields, you annotated the queryset using a SearchVector
. The vector is the data that you're searching for, which has been converted into a form that is easy to search. In the example above, this data is the name
and quote
fields in your database.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector
Try some searches out.
In this section, you'll combine several methods such as SearchVector, SearchQuery, and SearchRank to produce a very robust search that uses both stemming and ranking.
Again, stemming is the process of reducing words to their word stem, base, or root form. With stemming, words like "child" and "children" will be treated as similar words. Ranking, on the other hand, allows us to order results by relevancy.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
return (
Quote.objects.annotate(
search=search_vector, rank=SearchRank(search_vector, search_query)
)
.filter(search=search_query)
.order_by("-rank")
)
What's happening here?
SearchVector
- again you used a search vector to search against multiple fields. The data is converted into another form since you're no longer just searching the raw text like you did when icontains
was used. Therefore, with this, you will be able to search plurals easily. For example, searching for "flask" and "flasks" will yield the same search because they are, well, basically the same thing.SearchQuery
- translates the words provided to us as a query from the form, passes them through a stemming algorithm, and then it looks for matches for all of the resulting terms.SearchRank
- allows us to order the results by relevancy. It takes into account how often the query terms appear in the document, how close the terms are on the document, and how important the part of the document is where they occur.Add the imports:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
Compare the results from the basic search to that of the full-text search. There's a clear difference. In the full-text search, the query with the highest results is shown first. This is the power of SearchRank
. Combining SearchVector
, SearchQuery
, and SearchRank
is a quick way to produce a much more powerful and precise search than the basic search.
Full-text search gives us the ability to add more importance to some fields in our table in the database over other fields. We can achieve this by adding weights to our queries.
The weight should be one of the following letters D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", weight="B") + SearchVector(
"quote", weight="A"
)
search_query = SearchQuery(query)
return (
Quote.objects.annotate(rank=SearchRank(search_vector, search_query))
.filter(rank__gte=0.3)
.order_by("-rank")
)
Here, you added weights to the SearchVector
using both the name
and quote
fields. Weights of 0.4 and 1.0 were applied to the name and quote fields, respectively. Therefore, quote matches will prevail over name content matches. Finally, you filtered the results to display only the ones that are greater than 0.3.
In this section, you'll add a little preview of your search result via the SearchHeadline method. This will highlight the search result query.
Update SearchResultsList
again:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
search_headline = SearchHeadline("quote", search_query)
return Quote.objects.annotate(
search=search_vector,
rank=SearchRank(search_vector, search_query)
).annotate(headline=search_headline).filter(search=search_query).order_by("-rank")
The SearchHeadline
takes in the field you want to preview. In this case, this will be the quote
field along with the query, which will be in bold.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank, SearchHeadline
Before trying out some searches, update the <li></li>
in quotes/templates/search.html like so:
<li>{{ quote.headline | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Now, instead of showing the quotes as you did before, only a preview of the full quote field is displayed along with the highlighted search query.
Full-text search is an intensive process. To combat slow performance, you can:
quote
or name
fields.To learn more about performance with full-text search, review the Performance section from the Django docs.
Start by adding a new SearchVectorField field to the Quote
model in quotes/models.py:
from django.contrib.postgres.search import SearchVectorField # new
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True) # new
def __str__(self):
return self.quote
Create the migration file:
$ docker-compose exec web python manage.py makemigrations
Now, you can only populate this field when the quote
or name
objects already exists in the database. Thus, we need to add a trigger to update the search_vector
field whenever the quote
or name
fields are updated. To achieve this, create a custom migration file in "quotes/migrations" called 0003_search_vector_trigger.py:
from django.contrib.postgres.search import SearchVector
from django.db import migrations
def compute_search_vector(apps, schema_editor):
Quote = apps.get_model("quotes", "Quote")
Quote.objects.update(search_vector=SearchVector("name", "quote"))
class Migration(migrations.Migration):
dependencies = [
("quotes", "0002_quote_search_vector"),
]
operations = [
migrations.RunSQL(
sql="""
CREATE TRIGGER search_vector_trigger
BEFORE INSERT OR UPDATE OF name, quote, search_vector
ON quotes_quote
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, quote
);
UPDATE quotes_quote SET search_vector = NULL;
""",
reverse_sql="""
DROP TRIGGER IF EXISTS search_vector_trigger
ON quotes_quote;
""",
),
migrations.RunPython(
compute_search_vector, reverse_code=migrations.RunPython.noop
),
]
Depending on your project structure, you may need to update the name of the previous migration file in
dependencies
.
Apply the migrations:
$ docker-compose exec web python manage.py migrate
To use the new field for searches, update SearchResultsList
like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(search_vector=query)
Update the <li></li>
in quotes/templates/search.html again:
<li>{{ quote.quote | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Finally, let's set up a functional index, GinIndex.
Update the Quote
model:
from django.contrib.postgres.indexes import GinIndex # new
from django.contrib.postgres.search import SearchVectorField
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True)
def __str__(self):
return self.quote
# new
class Meta:
indexes = [
GinIndex(fields=["search_vector"]),
]
Create and apply the migrations one last time:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
Test it out.
In this tutorial, you were guided through adding basic and full-text search to a Django application. We also took a look at how to optimize the full-text search functionality by adding a search vector field and a database index.
Grab the complete code from the django-search repo.
Original article source at: https://testdriven.io/
1624131000
Unlike relational databases, full-text search is not standardized. There are a number of open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it’s complicated to set up and maintain. Further, if you’re not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.
For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.
In this article, we’ll add basic and full-text search to a Django app with Postgres.
By the end of this article, you should be able to:
#basic and full-text search with django and postgres #django #search lookup #postgres #full-text search #postgres full text search
1597475640
Here, I will show you how to create full text search in laravel app. You just follow the below easy steps and create full text search with mysql db in laravel.
Let’s start laravel full-text search implementation in laravel 7, 6 versions:
https://www.tutsmake.com/laravel-full-text-search-tutorial/
#laravel full text search mysql #laravel full text search query #mysql full text search in laravel #full text search in laravel 6 #full text search in laravel 7 #using full text search in laravel
1620177818
Welcome to my blog , hey everyone in this article you learn how to customize the Django app and view in the article you will know how to register and unregister models from the admin view how to add filtering how to add a custom input field, and a button that triggers an action on all objects and even how to change the look of your app and page using the Django suit package let’s get started.
#django #create super user django #customize django admin dashboard #django admin #django admin custom field display #django admin customization #django admin full customization #django admin interface #django admin register all models #django customization
1597989600
Full-Text Search refers to techniques for searching text content within a document or a collection of documents that hold textual content. A Full-Text search engine examines all the textual content within documents as it tries to match a single search term or several terms, text analysis being a pivotal component.
You’ve probably heard of the most well-known Full-Text Search engine: Lucene with Elasticsearch built on top of it. Couchbase’s Full-Text Search (FTS) Engine is powered by Bleve, and this article will showcase the various ways to analyze text within this engine.
Bleve is an open-sourced text indexing and search library implemented in Go, developed in-house at Couchbase.
Couchbase’s FTS engine supports indexes that subscribe to data residing within a Couchbase Server and indexes data that it ingests from the server. It’s a distributed system – meaning it can partition data across multiple nodes in a cluster and searches involve scattering the request and gathering responses from across all nodes within the cluster before responding to the application.
The FTS engine distributes documents ingested for an index across a configurable number of partitions and these partitions could reside across multiple nodes within a cluster. Each partition follows the same set of rules that the FTS index is configured with – to analyze and index text into the full-text search database.
The text analysis component of a Full-Text search engine is responsible for breaking down the raw text into a list of words – which we’ll refer to as tokens. These tokens are more suitable for indexing in the database and searching.
Couchbase’s FTS Engine handles text indexing for JSON documents. It builds an index for the content that is analyzed and stores into the database – the index along with all the relevant metadata needed to link the tokens generated to the original documents within which they reside.
An Inverted index is the data structure chosen to index the tokens generated from text, to make search queries faster. This index links every token generated to documents that contain the token.
For example, take the following documents …
The inverted index for the tokens generated from the 2 documents above would resemble this…
Here’s a diagram highlighting the components of the full-text search engine …
The components of a text analyzer can broadly be classified into 2 categories:
Couchbase’s engine further categorizes filters into:
Before we dive into the function of each of these components, here’s an overview of a text analyzer …
A tokenizer is the first component to which the documents are subjected to. As the name suggests, it breaks the raw text into a list of tokens. This conversion will depend on a rule-set defined for the tokenizer.
Stock tokenizers…
Take this sample text for an example: “_this is my email ID: _abhi123@cb.com”
A couple of configurable tokenizers…
For example:
#json #couchbase #search #go #text analysis #full-text search #bleve #full-text #full-text-indexing
1668842220
Unlike relational databases, full-text search is not standardized. There are several open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it's complicated to set up and maintain. Further, if you're not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.
For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.
In this tutorial, you'll learnUnlike relational databases, full-text search is not standardized. There are several open-source options like ElasticSearch, Solr, and Xapian. ElasticSearch is probably the most popular solution; however, it's complicated to set up and maintain. Further, if you're not taking advantage of some of the advanced features that ElasticSearch offers, you should stick with the full-text search capabilities that many relational (like Postgres, MySQL, SQLite) and non-relational databases (like MongoDB and CouchDB) offer. Postgres in particular is well-suited for full-text search. Django supports it out-of-the-box as well.
For the vast majority of your Django apps, you should, at the very least, start out with leveraging full-text search from Postgres before looking to a more powerful solution like ElasticSearch or Solr.
In this tutorial, you'll learn how to add basic and full-text search to a Django app with Postgres. You'll also optimize the full-text search by adding a search vector field and a database index.
This is an intermediate-level tutorial. It assumes that you're familiar with both Django and Docker. Review the Dockerizing Django with Postgres, Gunicorn, and Nginx tutorial for more info.
By the end of this tutorial, you will be able to:
Clone down the base branch from the django-search repo:
$ git clone https://github.com/testdrivenio/django-search --branch base --single-branch
$ cd django-search
You'll use Docker to simplify setting up and running Postgres along with Django.
From the project root, create the images and spin up the Docker containers:
$ docker-compose up -d --build
Next, apply the migrations and create a superuser:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
$ docker-compose exec web python manage.py createsuperuser
Once done, navigate to http://127.0.0.1:8011/quotes/ to ensure the app works as expected. You should see the following:
Want to learn how to work with Django and Postgres? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.
Take note of the Quote
model in quotes/models.py:
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
def __str__(self):
return self.quote
Next, run the following management command to add 10,000 quotes to the database:
$ docker-compose exec web python manage.py add_quotes
This will take a couple of minutes. Once done, navigate to http://127.0.0.1:8011/quotes/ to see the data.
The output of the view is cached for five minutes, so you may want to comment out the
@method_decorator
in quotes/views.py to load the quotes. Make sure to remove the comment once done.
In the quotes/templates/quote.html file, you have a basic form with a search input field:
<form action="{% url 'search_results' %}" method="get">
<input
type="search"
name="q"
placeholder="Search by name or quote..."
class="form-control"
/>
</form>
On submit, the form sends the data to the backend. A GET
request is used rather than a POST
so that way we have access to the query string both in the URL and in the Django view, allowing users to share search results as links.
Before proceeding further, take a quick look at the project structure and the rest of the code.
When it comes to search, with Django, you'll typically start by performing search queries with contains
or icontains
for exact matches. The Q object can be used as well to add AND (&
) or OR (|
) logical operators.
For instance, using the OR operator, override theSearchResultsList
's default QuerySet
in quotes/views.py like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(
Q(name__icontains=query) | Q(quote__icontains=query)
)
Here, we used the filter method to filter against the name
or quote
fields. Furthermore, we also used the icontains extension to check if the query is present in the name
or quote
fields (case insensitive). A positive result will be returned if a match is found.
Don't forget the import:
from django.db.models import Q
Try it out:
For small data sets, this is a great way to add basic search functionality to your app. If you're dealing with a large data set or want search functionality that feels like an Internet search engine, you'll want to move to full-text search.
The basic search that we saw earlier has several limitations especially when you want to perform complex lookups.
As mentioned, with basic search, you can only perform exact matches.
Another limitation is that of stop words. Stop words are words such as "a", "an", and "the". These words are common and insufficiently meaningful, therefore they should be ignored. To test, try searching for a word with "the" in front of it. Say you searched for "the middle". In this case, you'll only see results for "the middle", so you won't see any results that have the word "middle" without "the" before it.
Say you have these two sentences:
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"the middle" | 1 | 1 and 2 |
"middle" | 1 and 2 | 1 and 2 |
Another issue is that of ignoring similar words. With basic search, only exact matches are returned. However, with full-text search, similar words are accounted for. To test, try to find some similar words like "pony" and "ponies". With basic search, if you search for "pony" you won't see results that contain "ponies" -- and vice versa.
Say you have these two sentences.
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"pony" | 1 | 1 and 2 |
"ponies" | 2 | 1 and 2 |
With full-text search, both of these issues are mitigated. However, keep in mind that depending on your goal, full-text search may actually decrease precision (quality) and recall (quantity of relevant results). Typically, full-text search is less precise than basic search, since basic search yields exact matches. That said, if you're searching through large data sets with large blocks of text, full-text search is preferred since it's usually much faster.
Full-text search is an advanced searching technique that examines all the words in every stored document as it tries to match the search criteria. In addition, with full-text search, you can employ language-specific stemming on the words being indexed. For instance, the word "drives", "drove", and "driven" will be recorded under the single concept word "drive". Stemming is the process of reducing words to their word stem, base, or root form.
It suffices to say that full-text search is not perfect. It's likely to retrieve many documents that are not relevant (false positives) to the intended search query. However, there are some techniques based on Bayesian algorithms that can help reduce such problems.
To take advantage of Postgres full-text search with Django, add django.contrib.postgres
to your INSTALLED_APPS
list:
INSTALLED_APPS = [
...
"django.contrib.postgres", # new
]
Next, let's look at two quick examples of full-text search, on a single field and on multiple fields.
Update the get_queryset
function under the SearchResultsList
view function like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(quote__search=query)
Here, we set up full-text search against a single field -- the quote field.
As you can see, it takes similar words into account. In the above example, "ponies" and "pony" are treated as similar words.
To search against multiple fields and on related models, you can use the SearchVector
class.
Again, update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.annotate(search=SearchVector("name", "quote")).filter(
search=query
)
To search against multiple fields, you annotated the queryset using a SearchVector
. The vector is the data that you're searching for, which has been converted into a form that is easy to search. In the example above, this data is the name
and quote
fields in your database.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector
Try some searches out.
In this section, you'll combine several methods such as SearchVector, SearchQuery, and SearchRank to produce a very robust search that uses both stemming and ranking.
Again, stemming is the process of reducing words to their word stem, base, or root form. With stemming, words like "child" and "children" will be treated as similar words. Ranking, on the other hand, allows us to order results by relevancy.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
return (
Quote.objects.annotate(
search=search_vector, rank=SearchRank(search_vector, search_query)
)
.filter(search=search_query)
.order_by("-rank")
)
What's happening here?
SearchVector
- again you used a search vector to search against multiple fields. The data is converted into another form since you're no longer just searching the raw text like you did when icontains
was used. Therefore, with this, you will be able to search plurals easily. For example, searching for "flask" and "flasks" will yield the same search because they are, well, basically the same thing.SearchQuery
- translates the words provided to us as a query from the form, passes them through a stemming algorithm, and then it looks for matches for all of the resulting terms.SearchRank
- allows us to order the results by relevancy. It takes into account how often the query terms appear in the document, how close the terms are on the document, and how important the part of the document is where they occur.Add the imports:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
Compare the results from the basic search to that of the full-text search. There's a clear difference. In the full-text search, the query with the highest results is shown first. This is the power of SearchRank
. Combining SearchVector
, SearchQuery
, and SearchRank
is a quick way to produce a much more powerful and precise search than the basic search.
Full-text search gives us the ability to add more importance to some fields in our table in the database over other fields. We can achieve this by adding weights to our queries.
The weight should be one of the following letters D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", weight="B") + SearchVector(
"quote", weight="A"
)
search_query = SearchQuery(query)
return (
Quote.objects.annotate(rank=SearchRank(search_vector, search_query))
.filter(rank__gte=0.3)
.order_by("-rank")
)
Here, you added weights to the SearchVector
using both the name
and quote
fields. Weights of 0.4 and 1.0 were applied to the name and quote fields, respectively. Therefore, quote matches will prevail over name content matches. Finally, you filtered the results to display only the ones that are greater than 0.3.
In this section, you'll add a little preview of your search result via the SearchHeadline method. This will highlight the search result query.
Update SearchResultsList
again:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
search_headline = SearchHeadline("quote", search_query)
return Quote.objects.annotate(
search=search_vector,
rank=SearchRank(search_vector, search_query)
).annotate(headline=search_headline).filter(search=search_query).order_by("-rank")
The SearchHeadline
takes in the field you want to preview. In this case, this will be the quote
field along with the query, which will be in bold.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank, SearchHeadline
Before trying out some searches, update the <li></li>
in quotes/templates/search.html like so:
<li>{{ quote.headline | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Now, instead of showing the quotes as you did before, only a preview of the full quote field is displayed along with the highlighted search query.
Full-text search is an intensive process. To combat slow performance, you can:
quote
or name
fields.To learn more about performance with full-text search, review the Performance section from the Django docs.
Start by adding a new SearchVectorField field to the Quote
model in quotes/models.py:
from django.contrib.postgres.search import SearchVectorField # new
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True) # new
def __str__(self):
return self.quote
Create the migration file:
$ docker-compose exec web python manage.py makemigrations
Now, you can only populate this field when the quote
or name
objects already exists in the database. Thus, we need to add a trigger to update the search_vector
field whenever the quote
or name
fields are updated. To achieve this, create a custom migration file in "quotes/migrations" called 0003_search_vector_trigger.py:
from django.contrib.postgres.search import SearchVector
from django.db import migrations
def compute_search_vector(apps, schema_editor):
Quote = apps.get_model("quotes", "Quote")
Quote.objects.update(search_vector=SearchVector("name", "quote"))
class Migration(migrations.Migration):
dependencies = [
("quotes", "0002_quote_search_vector"),
]
operations = [
migrations.RunSQL(
sql="""
CREATE TRIGGER search_vector_trigger
BEFORE INSERT OR UPDATE OF name, quote, search_vector
ON quotes_quote
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, quote
);
UPDATE quotes_quote SET search_vector = NULL;
""",
reverse_sql="""
DROP TRIGGER IF EXISTS search_vector_trigger
ON quotes_quote;
""",
),
migrations.RunPython(
compute_search_vector, reverse_code=migrations.RunPython.noop
),
]
Depending on your project structure, you may need to update the name of the previous migration file in
dependencies
.
Apply the migrations:
$ docker-compose exec web python manage.py migrate
To use the new field for searches, update SearchResultsList
like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(search_vector=query)
Update the <li></li>
in quotes/templates/search.html again:
<li>{{ quote.quote | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Finally, let's set up a functional index, GinIndex.
Update the Quote
model:
from django.contrib.postgres.indexes import GinIndex # new
from django.contrib.postgres.search import SearchVectorField
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True)
def __str__(self):
return self.quote
# new
class Meta:
indexes = [
GinIndex(fields=["search_vector"]),
]
Create and apply the migrations one last time:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
Test it out.
In this tutorial, you were guided through adding basic and full-text search to a Django application. We also took a look at how to optimize the full-text search functionality by adding a search vector field and a database index.
Grab the complete code from the django-search repo.
This is an intermediate-level tutorial. It assumes that you're familiar with both Django and Docker. Review the Dockerizing Django with Postgres, Gunicorn, and Nginx tutorial for more info.
By the end of this tutorial, you will be able to:
Clone down the base branch from the django-search repo:
$ git clone https://github.com/testdrivenio/django-search --branch base --single-branch
$ cd django-search
You'll use Docker to simplify setting up and running Postgres along with Django.
From the project root, create the images and spin up the Docker containers:
$ docker-compose up -d --build
Next, apply the migrations and create a superuser:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
$ docker-compose exec web python manage.py createsuperuser
Once done, navigate to http://127.0.0.1:8011/quotes/ to ensure the app works as expected. You should see the following:
Want to learn how to work with Django and Postgres? Check out the Dockerizing Django with Postgres, Gunicorn, and Nginx article.
Take note of the Quote
model in quotes/models.py:
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
def __str__(self):
return self.quote
Next, run the following management command to add 10,000 quotes to the database:
$ docker-compose exec web python manage.py add_quotes
This will take a couple of minutes. Once done, navigate to http://127.0.0.1:8011/quotes/ to see the data.
The output of the view is cached for five minutes, so you may want to comment out the
@method_decorator
in quotes/views.py to load the quotes. Make sure to remove the comment once done.
In the quotes/templates/quote.html file, you have a basic form with a search input field:
<form action="{% url 'search_results' %}" method="get">
<input
type="search"
name="q"
placeholder="Search by name or quote..."
class="form-control"
/>
</form>
On submit, the form sends the data to the backend. A GET
request is used rather than a POST
so that way we have access to the query string both in the URL and in the Django view, allowing users to share search results as links.
Before proceeding further, take a quick look at the project structure and the rest of the code.
When it comes to search, with Django, you'll typically start by performing search queries with contains
or icontains
for exact matches. The Q object can be used as well to add AND (&
) or OR (|
) logical operators.
For instance, using the OR operator, override theSearchResultsList
's default QuerySet
in quotes/views.py like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(
Q(name__icontains=query) | Q(quote__icontains=query)
)
Here, we used the filter method to filter against the name
or quote
fields. Furthermore, we also used the icontains extension to check if the query is present in the name
or quote
fields (case insensitive). A positive result will be returned if a match is found.
Don't forget the import:
from django.db.models import Q
Try it out:
For small data sets, this is a great way to add basic search functionality to your app. If you're dealing with a large data set or want search functionality that feels like an Internet search engine, you'll want to move to full-text search.
The basic search that we saw earlier has several limitations especially when you want to perform complex lookups.
As mentioned, with basic search, you can only perform exact matches.
Another limitation is that of stop words. Stop words are words such as "a", "an", and "the". These words are common and insufficiently meaningful, therefore they should be ignored. To test, try searching for a word with "the" in front of it. Say you searched for "the middle". In this case, you'll only see results for "the middle", so you won't see any results that have the word "middle" without "the" before it.
Say you have these two sentences:
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"the middle" | 1 | 1 and 2 |
"middle" | 1 and 2 | 1 and 2 |
Another issue is that of ignoring similar words. With basic search, only exact matches are returned. However, with full-text search, similar words are accounted for. To test, try to find some similar words like "pony" and "ponies". With basic search, if you search for "pony" you won't see results that contain "ponies" -- and vice versa.
Say you have these two sentences.
You'll get the following returned with each type of search:
Query | Basic Search | Full-text Search |
---|---|---|
"pony" | 1 | 1 and 2 |
"ponies" | 2 | 1 and 2 |
With full-text search, both of these issues are mitigated. However, keep in mind that depending on your goal, full-text search may actually decrease precision (quality) and recall (quantity of relevant results). Typically, full-text search is less precise than basic search, since basic search yields exact matches. That said, if you're searching through large data sets with large blocks of text, full-text search is preferred since it's usually much faster.
Full-text search is an advanced searching technique that examines all the words in every stored document as it tries to match the search criteria. In addition, with full-text search, you can employ language-specific stemming on the words being indexed. For instance, the word "drives", "drove", and "driven" will be recorded under the single concept word "drive". Stemming is the process of reducing words to their word stem, base, or root form.
It suffices to say that full-text search is not perfect. It's likely to retrieve many documents that are not relevant (false positives) to the intended search query. However, there are some techniques based on Bayesian algorithms that can help reduce such problems.
To take advantage of Postgres full-text search with Django, add django.contrib.postgres
to your INSTALLED_APPS
list:
INSTALLED_APPS = [
...
"django.contrib.postgres", # new
]
Next, let's look at two quick examples of full-text search, on a single field and on multiple fields.
Update the get_queryset
function under the SearchResultsList
view function like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(quote__search=query)
Here, we set up full-text search against a single field -- the quote field.
As you can see, it takes similar words into account. In the above example, "ponies" and "pony" are treated as similar words.
To search against multiple fields and on related models, you can use the SearchVector
class.
Again, update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.annotate(search=SearchVector("name", "quote")).filter(
search=query
)
To search against multiple fields, you annotated the queryset using a SearchVector
. The vector is the data that you're searching for, which has been converted into a form that is easy to search. In the example above, this data is the name
and quote
fields in your database.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector
Try some searches out.
In this section, you'll combine several methods such as SearchVector, SearchQuery, and SearchRank to produce a very robust search that uses both stemming and ranking.
Again, stemming is the process of reducing words to their word stem, base, or root form. With stemming, words like "child" and "children" will be treated as similar words. Ranking, on the other hand, allows us to order results by relevancy.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
return (
Quote.objects.annotate(
search=search_vector, rank=SearchRank(search_vector, search_query)
)
.filter(search=search_query)
.order_by("-rank")
)
What's happening here?
SearchVector
- again you used a search vector to search against multiple fields. The data is converted into another form since you're no longer just searching the raw text like you did when icontains
was used. Therefore, with this, you will be able to search plurals easily. For example, searching for "flask" and "flasks" will yield the same search because they are, well, basically the same thing.SearchQuery
- translates the words provided to us as a query from the form, passes them through a stemming algorithm, and then it looks for matches for all of the resulting terms.SearchRank
- allows us to order the results by relevancy. It takes into account how often the query terms appear in the document, how close the terms are on the document, and how important the part of the document is where they occur.Add the imports:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
Compare the results from the basic search to that of the full-text search. There's a clear difference. In the full-text search, the query with the highest results is shown first. This is the power of SearchRank
. Combining SearchVector
, SearchQuery
, and SearchRank
is a quick way to produce a much more powerful and precise search than the basic search.
Full-text search gives us the ability to add more importance to some fields in our table in the database over other fields. We can achieve this by adding weights to our queries.
The weight should be one of the following letters D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively.
Update SearchResultsList
:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", weight="B") + SearchVector(
"quote", weight="A"
)
search_query = SearchQuery(query)
return (
Quote.objects.annotate(rank=SearchRank(search_vector, search_query))
.filter(rank__gte=0.3)
.order_by("-rank")
)
Here, you added weights to the SearchVector
using both the name
and quote
fields. Weights of 0.4 and 1.0 were applied to the name and quote fields, respectively. Therefore, quote matches will prevail over name content matches. Finally, you filtered the results to display only the ones that are greater than 0.3.
In this section, you'll add a little preview of your search result via the SearchHeadline method. This will highlight the search result query.
Update SearchResultsList
again:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
search_vector = SearchVector("name", "quote")
search_query = SearchQuery(query)
search_headline = SearchHeadline("quote", search_query)
return Quote.objects.annotate(
search=search_vector,
rank=SearchRank(search_vector, search_query)
).annotate(headline=search_headline).filter(search=search_query).order_by("-rank")
The SearchHeadline
takes in the field you want to preview. In this case, this will be the quote
field along with the query, which will be in bold.
Make sure to add the import:
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank, SearchHeadline
Before trying out some searches, update the <li></li>
in quotes/templates/search.html like so:
<li>{{ quote.headline | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Now, instead of showing the quotes as you did before, only a preview of the full quote field is displayed along with the highlighted search query.
Full-text search is an intensive process. To combat slow performance, you can:
quote
or name
fields.To learn more about performance with full-text search, review the Performance section from the Django docs.
Start by adding a new SearchVectorField field to the Quote
model in quotes/models.py:
from django.contrib.postgres.search import SearchVectorField # new
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True) # new
def __str__(self):
return self.quote
Create the migration file:
$ docker-compose exec web python manage.py makemigrations
Now, you can only populate this field when the quote
or name
objects already exists in the database. Thus, we need to add a trigger to update the search_vector
field whenever the quote
or name
fields are updated. To achieve this, create a custom migration file in "quotes/migrations" called 0003_search_vector_trigger.py:
from django.contrib.postgres.search import SearchVector
from django.db import migrations
def compute_search_vector(apps, schema_editor):
Quote = apps.get_model("quotes", "Quote")
Quote.objects.update(search_vector=SearchVector("name", "quote"))
class Migration(migrations.Migration):
dependencies = [
("quotes", "0002_quote_search_vector"),
]
operations = [
migrations.RunSQL(
sql="""
CREATE TRIGGER search_vector_trigger
BEFORE INSERT OR UPDATE OF name, quote, search_vector
ON quotes_quote
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, quote
);
UPDATE quotes_quote SET search_vector = NULL;
""",
reverse_sql="""
DROP TRIGGER IF EXISTS search_vector_trigger
ON quotes_quote;
""",
),
migrations.RunPython(
compute_search_vector, reverse_code=migrations.RunPython.noop
),
]
Depending on your project structure, you may need to update the name of the previous migration file in
dependencies
.
Apply the migrations:
$ docker-compose exec web python manage.py migrate
To use the new field for searches, update SearchResultsList
like so:
class SearchResultsList(ListView):
model = Quote
context_object_name = "quotes"
template_name = "search.html"
def get_queryset(self):
query = self.request.GET.get("q")
return Quote.objects.filter(search_vector=query)
Update the <li></li>
in quotes/templates/search.html again:
<li>{{ quote.quote | safe }} - <b>By <i>{{ quote.name }}</i></b></li>
Finally, let's set up a functional index, GinIndex.
Update the Quote
model:
from django.contrib.postgres.indexes import GinIndex # new
from django.contrib.postgres.search import SearchVectorField
from django.db import models
class Quote(models.Model):
name = models.CharField(max_length=250)
quote = models.TextField(max_length=1000)
search_vector = SearchVectorField(null=True)
def __str__(self):
return self.quote
# new
class Meta:
indexes = [
GinIndex(fields=["search_vector"]),
]
Create and apply the migrations one last time:
$ docker-compose exec web python manage.py makemigrations
$ docker-compose exec web python manage.py migrate
Test it out.
In this tutorial, you were guided through adding basic and full-text search to a Django application. We also took a look at how to optimize the full-text search functionality by adding a search vector field and a database index.
Grab the complete code from the django-search repo.
Original article source at: https://testdriven.io/