How to handle multiple updates / deletes with Elasticsearch?

Marvin Saldinger picture Marvin Saldinger · Sep 2, 2014 · Viewed 13.8k times · Source

I need to update or delete several documents.

When I update I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000).
  2. For each of the returned documents, I modify certain values.
  3. I resent to elasticsearch the whole modified list (bulk index).

This operation takes place until point 1 no longer returns results.

When I delete I do this:

  1. I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000)
  2. I delete every found document sending to elasticsearch _id document (10000 requests)

This operation repeats until point 1 no longer returns results.

Is this the right way to make an update?

When I delete, is there a way I can send several ids to delete multiple documents at once?

Answer

ThomasC picture ThomasC · Sep 2, 2014

For your massive index/update operation, if you don't use it already (not sure), you can take a look at the bulk api documentation. it is tailored for this kind of job.

If you want to retrieve lots of documents by small batches, you should use the scan-scroll search instead of using from/size. Related information can be found here.

To sum up :

  • scroll api is used to load results in memory and to be able to iterate over it efficiently
  • scan search type disable sorting, which is costly

Give it a try, depending on the data volume, it could improve the performance of your batch operations.

For the delete operation, you can use this same _bulk api to send multiple delete operation at once.

The format of each line is the following :

{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "1" } }
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "2" } }