v Elasticsearch 5.6.*.
I'm looking at a way to implement a mechanism by which one of my index (that grows big in no time about 1 million documents per day) to manage the storage constraints automatically.
For example: I will define the max no of documents or max index size as a variable 'n'. I'd write a scheduler that checks whether 'n' is true. If true, then I'd want to delete the oldest 'x' documents (based on time).
I have a couple of questions here:
Apparently, I do not want to delete too much or too less. How would I know what 'x' is? Can I simply say to elasticsearch that "Hey delete the oldest documents worth 5GB" - My intent is to simply free up a fixed amount of storage. Is this possible?
Secondly, I'd want to know what's the best practice here? Obviously I don't want to invent a square wheel here and if there's anything (eg: Curator and I've been hearing about it only recently) that does the job then I'd be happy to use it.
In your case, the best practice is to work with time-based indices, either daily, weekly or monthly indices, whichever makes sense for the amount of data you have and the retention you want. You also have the possibility to use the Rollover API in order to decide when a new index needs to be created (based on time, number of documents or index size)
It is much easier to delete an entire index than delete documents matching certain conditions within an index. If you do the latter, the documents will be deleted but the space won't be freed until the underlying segments get merged. Whereas if you delete an entire time-based index, then you're guaranteed to free up space.