How to sync MSSQL to Elasticsearch?

Tadej picture Tadej · Aug 16, 2017 · Viewed 11.3k times · Source

Every time I Google this, I find the "river" approach which is deprecated. I'm using Dapper if this is somehow a helpful information.

So what's the solution for this these days?

Answer

user3956566 picture user3956566 · Aug 19, 2017

Your question is on the broad side - so this is a pointer to some options.

Elastic search is used to query the database and analyse the data.

In the article Deprecating Rivers:

Client libraries

For more than a year, we've had official client libraries for Elasticsearch in most programming languages. It means that hooking into your application and getting data through an existing codebase should be relatively simple. This technique also allows to easily munge the data before it gets to Elasticsearch. A common example is an application that already used an ORM to map the domain model to a database, and hooking and indexing the domain model back to Elasticsearch tends to be simple to implement.

There's extensive documentation on how to use elastic search in:

Elasticsearch.Net.

The docs will address the following:

Install the package:

PM> Install-Package Elasticsearch.Net

Connection

var node = new Uri("http://mynode.example.com:8082/apiKey");  
var config = new ConnectionConfiguration(node);  
var client = new ElasticsearchClient(config);`  

Security

Pooling and Failover

Building requests

This is what you'll need to develop.

Response handling

Error handling

Plugins

Logstash can also be used instead of Rivers, from which various plugins have been developed.

Also, Logstash, or similar tools, can be used to ship data into Elasticsearch. For example, some of the rivers Elasticsearch came with are now implemented as Logstash plugins (like the CouchDB one) in the forthcoming Logstash 1.5.

Extra reading

Although this is a different language and framework - the blog Advanced Search for Your Legacy Application by David Pilato and information may be helpful to browse. He recommends doing it in the application layer.

To address issues from the comments.

Data changes can be tracked.

SQL Server provides an inbuilt system to track data changes, an effective means of automatically tracking changes to the data without having to implement manual methods to check for changes.

There's two means by which to acheive this:

Using Change Data Capture:

Data changes are tracked with timestamps. The history of the data changes can be tracked.

Change data capture provides historical change information for a user table by capturing both the fact that DML changes were made and the actual data that was changed. Changes are captured by using an asynchronous process that reads the transaction log and has a low impact on the system.

Using Change Tracking:

This has less overheads, but does not keep track of historical changes. The latest changes are kept, but no further back.

Change tracking captures the fact that rows in a table were changed, but does not capture the data that was changed. This enables applications to determine the rows that have changed with the latest row data being obtained directly from the user tables. Therefore, change tracking is more limited in the historical questions it can answer compared to change data capture. .../...