what is the better way to index data from Oracle/relational tables into elastic search?

Rakesh Shiriyara picture Rakesh Shiriyara · Dec 15, 2014 · Viewed 10.5k times · Source

What are the options to index large data from Oracle DB to elastic search cluster? Requirement is to index 300Million records one time into multiple indexes and also incremental updates having around approximate 1 Million changes every day.

I have tried JDBC plugin for elasticsearch river/feeder, both seems to be running inside or require locally running elastic search instance. Please let me know if there is any better option for running elastic search indexer as a standalone job (probably java based). Any suggestions will be very helpful. Thanks.

Answer

jhilden picture jhilden · Dec 16, 2014

We use ES as a reporting db and when new records are written to SQL we take the following action to get them into ES:

  1. Write the primary key into a queue (we use rabbitMQ)
  2. Rabbit picks up the primary key (when it has time) and queries the relation DB to get the info it needs and then writes the data into ES

This process works great because it handles both new data and old data. For old data just write a quick script to write 300M primary keys into rabbit and you're done!