ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?

user916367 picture user916367 · Nov 18, 2016 · Viewed 12.2k times · Source

I'm attempting to improve performance on a suite that tests against ElasticSearch.

The tests take a long time because Elasticsearch does not update it's indexes immediately after updating. For instance, the following code runs without raising an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

results = elasticsearch.search()
assert not results
# results are not populated

Currently out hacked together solution to this issue is dropping a time.sleep call into the code, to give ElasticSearch some time to update it's indexes.

from time import sleep
from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Don't want to use sleep functions
sleep(1)

results = elasticsearch.search()
assert len(results) == 1
# results are now populated

Obviously this isn't great, as it's rather failure prone, hypothetically if ElasticSearch takes longer than a second to update it's indexes, despite how unlikely that is, the test will fail. Also it's extremely slow when you're running 100s of tests like this.

My attempt to solve the issue has been to query the pending cluster jobs to see if there are any tasks left to be done. However this doesn't work, and this code will run without an assertion error.

from elasticsearch import Elasticsearch
elasticsearch = Elasticsearch('es.test')

# Asumming that this is a clean and empty elasticsearch instance
elasticsearch.update(
     index='blog',
     doc_type=,'blog'
     id=1,
     body={
        ....
    }
)

# Query if there are any pending tasks
while elasticsearch.cluster.pending_tasks()['tasks']:
    pass

results = elasticsearch.search()
assert not results
# results are not populated

So basically, back to my original question, ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?

Answer

TinkerTank picture TinkerTank · Nov 18, 2016

As of version 5.0.0, elasticsearch has an option:

 ?refresh=wait_for

on the Index, Update, Delete, and Bulk api's. This way, the request won't receive a response until the result is visible in ElasticSearch. (Yay!)

See https://www.elastic.co/guide/en/elasticsearch/reference/master/docs-refresh.html for more information.

edit: It seems that this functionality is already part of the latest Python elasticsearch api: https://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.index

Change your elasticsearch.update to:

elasticsearch.update(
     index='blog',
     doc_type='blog'
     id=1,
     refresh='wait_for',
     body={
        ....
    }
)

and you shouldn't need any sleep or polling.