Logo

First steps

  • Getting Started with Sphinx
  • Getting Started with MkDocs
  • Importing Your Documentation
  • Read the Docs features
  • Choosing Between Our Two Sites

Getting started

  • Configuration File
  • Incoming Webhooks and Automation
  • Custom Domains and White Labeling
  • Versioned Documentation
  • Downloadable Documentation
  • Server Side Search
  • Documentation Hosting Features
  • Connecting Your VCS Account
  • Build Process
  • Badges
  • Support
  • Frequently Asked Questions

Step-by-step Guides

  • Sphinx & MkDocs Guides
  • Read the Docs Guides
  • Read the Docs for Business Guides

Advanced features

  • Subprojects
  • Single Version Documentation
  • Localization of Documentation
  • User-defined Redirects
  • Automatic Redirects
  • Automation Rules
  • Public API

About Read the Docs

  • Contributing to Read the Docs
  • Developer documentation
    • Development Setup and Standards
    • Search
      • Local Development Configuration
        • Indexing into Elasticsearch
        • Auto Indexing
        • Manual Elasticsearch installation and setup
      • Architecture
        • Indexing
        • Troubelshooting
    • Architecture
    • Testing
    • Building and Contributing to Documentation
    • Front End Development
    • Design Documents
    • Build Environments
    • Interesting Settings
    • Installation
    • Internationalization
    • Overview of issue labels
    • Designing Read the Docs
    • RTD Theme
  • Roadmap
  • Google Summer of Code
  • Code of Conduct
  • Security
  • Privacy Policy
  • Read the Docs Terms of Service
  • DMCA Takedown Policy
  • Policy for Abandoned Projects
  • Changelog
  • About Read the Docs
  • Read the Docs Team
  • Read the Docs Open Source Philosophy
  • The Story of Read the Docs
  • Advertising
  • Sponsors of Read the Docs
  • Read the Docs for Business
  • Info about custom installs
Read the Docs
  • »
  • Developer documentation »
  • Search
  • Edit on GitHub

Search¶

Read The Docs uses Elasticsearch instead of the built in Sphinx search for providing better search results. Documents are indexed in the Elasticsearch index and the search is made through the API. All the Search Code is open source and lives in the GitHub Repository. Currently we are using Elasticsearch 6.3.

Local Development Configuration¶

Elasticsearch is installed and run as part of the development setup.

Indexing into Elasticsearch¶

For using search, you need to index data to the Elasticsearch Index. Run reindex_elasticsearch management command:

inv docker.manage reindex_elasticsearch

For performance optimization, we implemented our own version of management command rather than the built in management command provided by the django-elasticsearch-dsl package.

Auto Indexing¶

By default, Auto Indexing is turned off in development mode. To turn it on, change the ELASTICSEARCH_DSL_AUTOSYNC settings to True in the readthedocs/settings/dev.py file. After that, whenever a documentation successfully builds, or project gets added, the search index will update automatically.

Manual Elasticsearch installation and setup¶

Usually you can just rely on the Docker Compose development setup which includes Elasticsearch. However, if you’re developing or testing Read the Docs’ search integration, you may need this.

You need to install and run Elasticsearch version 6.3 on your local development machine. You can get the installation instructions here. Otherwise, you can also start an Elasticsearch Docker container by running the following command:

docker run -p 9200:9200 -p 9300:9300 \
       -e "discovery.type=single-node" \
       docker.elastic.co/elasticsearch/elasticsearch:6.3.2

Architecture¶

The search architecture is divided into 2 parts.

  • One part is responsible for indexing the documents and projects (documents.py)

  • The other part is responsible for querying the Index to show the proper results to users (faceted_search.py)

We use the django-elasticsearch-dsl package for our Document abstraction. django-elasticsearch-dsl is a wrapper around elasticsearch-dsl for easy configuration with Django.

Indexing¶

All the Sphinx documents are indexed into Elasticsearch after the build is successful. Currently, we do not index MkDocs documents to elasticsearch, but any kind of help is welcome.

Troubelshooting¶

If you get an error like:

RequestError(400, 'search_phase_execution_exception', 'failed to create query: ...

You can fix this by deleting the page index:

inv docker.manage 'search_index --delete'

Note

You’ll need to reindex the projects after this.

How we index documentations¶

After any build is successfully finished, HTMLFile objects are created for each of the HTML files and the old version’s HTMLFile object is deleted. By default, django-elasticsearch-dsl package listens to the post_create/post_delete signals to index/delete documents, but it has performance drawbacks as it send HTTP request whenever any HTMLFile objects is created or deleted. To optimize the performance, bulk_post_create and bulk_post_delete signals are dispatched with list of HTMLFIle objects so its possible to bulk index documents in elasticsearch ( bulk_post_create signal is dispatched for created and bulk_post_delete is dispatched for deleted objects). Both of the signals are dispatched with the list of the instances of HTMLFile in instance_list parameter.

We listen to the bulk_post_create and bulk_post_delete signals in our Search application and index/delete the documentation content from the HTMLFile instances.

How we index projects¶

We also index project information in our search index so that the user can search for projects from the main site. We listen to the post_create and post_delete signals of Project model and index/delete into Elasticsearch accordingly.

Elasticsearch Document¶

elasticsearch-dsl provides a model-like wrapper for the Elasticsearch document. As per requirements of django-elasticsearch-dsl, it is stored in the readthedocs/search/documents.py file.

ProjectDocument: It is used for indexing projects. Signal listener of django-elasticsearch-dsl listens to the post_save signal of Project model and then index/delete into Elasticsearch.

PageDocument: It is used for indexing documentation of projects. As mentioned above, our Search app listens to the bulk_post_create and bulk_post_delete signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in the readthedocs/search/signals.py file. Both of the signals are dispatched after a successful documentation build.

The fields and ES Datatypes are specified in the PageDocument. The indexable data is taken from processed_json property of HTMLFile. This property provides python dictionary with document data like title, sections, path etc.

Next Previous

© Copyright 2010-2020, Read the Docs, Inc & contributors Revision 5b53a003.

Built with Sphinx using a theme provided by Read the Docs.