ElasticSearch Nest Insert/Update

Andrew Walters picture Andrew Walters · Aug 18, 2016 · Viewed 14k times · Source

I have created an index in elastic using the following query:

PUT public_site
{
  "mappings": {
    "page": {
      "properties": {
        "url": {
          "type": "string"
        },
        "title":{
          "type": "string"
        },
        "body":{
          "type": "string"
        },
        "meta_description":{
          "type": "string"
        },
        "keywords":{
          "type": "string"
        },
        "category":{
          "type": "string"
        },
        "last_updated_date":{
          "type": "date"
        },
        "source_id":{
        "type":"string"
        }
      }
    }
  }
}

I would like to insert a document into this index using the .net NEST library. My issue is that the .net update method's signature doesn't make any sense to me.

client.Update<TDocument>(IUpdateRequest<TDocument,TPartialDocument>)

The Java library makes so much more sense to me:

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

In NEST where do the TDocument and TPartialDocument classes come from? Are these C# classes that I make representing my index?

Answer

Russ Cam picture Russ Cam · Aug 19, 2016

TDocument and TPartialDocument are generic type parameters for the POCO type that

  • represent a document in Elasticsearch (TDocument) and
  • a representation of part of the the document in Elasticsearch (TPartialDocument), when performing a partial update.

In the case of a full update, TDocument and TPartialDocument may refer to the same concrete POCO type. Let's have a look at some examples to demonstrate.

Let's create an index with the mapping that you have defined above. Firstly, we can represent a document using a POCO type

public class Page
{
    public string Url { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    [String(Name="meta_description")]
    public string MetaDescription { get; set; }

    public IList<string> Keywords { get; set; }

    public string Category { get; set; }

    [Date(Name="last_updated_date")]
    public DateTimeOffset LastUpdatedDate { get; set; }

    [String(Name="source_id")]
    public string SourceId { get; set; }
}

By default, when NEST serializes POCO properties it uses camel casing naming convention. Because your index has snake casing for some properties e.g. "last_updated_date", we can override the name that NEST serializes these to using attributes.

Next, let's create the client to work with

var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var pagesIndex = "pages";
var connectionSettings = new ConnectionSettings(pool)
        .DefaultIndex(pagesIndex)
        .PrettyJson()
        .DisableDirectStreaming()
        .OnRequestCompleted(response =>
            {
                // log out the request
                if (response.RequestBodyInBytes != null)
                {
                    Console.WriteLine(
                        $"{response.HttpMethod} {response.Uri} \n" +
                        $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}");
                }
                else
                {
                    Console.WriteLine($"{response.HttpMethod} {response.Uri}");
                }

                Console.WriteLine();

                // log out the response
                if (response.ResponseBodyInBytes != null)
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" +
                             $"{new string('-', 30)}\n");
                }
                else
                {
                    Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                             $"{new string('-', 30)}\n");
                }
            });

var client = new ElasticClient(connectionSettings);

Connection settings has been configured in a way that is helpful whilst developing;

  1. DefaultIndex() - The default index has been configured to be "pages". If no explicit index name is passed on a request and no index name can be inferred for a POCO, then the default index will be used.
  2. PrettyJson() - Prettify (i.e. indent) json requests and responses. This will be useful to see what is being sent to and received from Elasticsearch.
  3. DisableDirectStreaming() - NEST by default serializes POCOs to the request stream and deserializes response types from the response stream. Disabling this direct streaming will buffer the request and response bytes in memory streams, allowing us to log them out in OnRequestCompleted()
  4. OnRequestCompleted() - Called after a response is received. This allows us to log out requests and responses whilst we're developing.

2, 3 and 4 are useful during development but will come with some performance overhead so you may decide not to use them in production.

Now, let's create the index with the Page mapping

// delete the index if it exists. Useful for demo purposes so that
// we can re-run this example.
if (client.IndexExists(pagesIndex).Exists)
    client.DeleteIndex(pagesIndex);

// create the index, adding the mapping for the Page type to the index
// at the same time. Automap() will infer the mapping from the POCO
var createIndexResponse = client.CreateIndex(pagesIndex, c => c
    .Mappings(m => m
        .Map<Page>(p => p
            .AutoMap()
        )
    )
);

Take a look at the automapping documentation for more details around how you can control mapping for POCO types

Indexing a new Page type is as simple as

// create a sample Page
var page = new Page
{
    Title = "Sample Page",
    Body = "Sample Body",
    Category = "sample",
    Keywords = new List<string>
    {
        "sample",
        "example", 
        "demo"
    },
    LastUpdatedDate = DateTime.UtcNow,
    MetaDescription = "Sample meta description",
    SourceId = "1",
    Url = "/pages/sample-page"
};

// index the sample Page into Elasticsearch.
// NEST will infer the document type (_type) from the POCO type,
// by default it will camel case the POCO type name
var indexResponse = client.Index(page);

Indexing a document will create the document if it does not exist, or overwrite an existing document if it does exist. Elasticsearch has optimistic concurrency control that can be used to control how this behaves under different conditions.

We can update a document using the Update methods, but first a little background.

We can get a document from Elasticsearch by specifying the index, type and id. NEST makes this slightly easier because we can infer all of these from the POCO. When we created our mapping, we didn't specify an Id property on the POCO; if NEST sees a property called Id, it uses this as the id for the document but because we don't have one, that's not a problem as Elasticsearch will generate an id for the document and put this in the document metadata. Because the document metadata is separate from the source document however, this can make modelling documents as POCO types a little trickier (but not impossible); for a given response, we will have access to the id of the document through the metadata and access to the source through the _source field. We can combine the id with our source in the application.

An easier way to address this though is to have an id on the POCO. We can specify an Id property on the POCO and this will be used as the id of the document, but we don't have to call the property Id if we don't want to and if we don't, we need to tell NEST which property represents the id. This can be done with an attribute. Assuming that SourceId is a unique id for a Page instance, use the ElasticsearchTypeAttribute IdProperty property to specify this. Maybe we shouldn't also analyze this string but index it verbatim, we can also control this through the Index property of the attribute on the property

[ElasticsearchType(IdProperty = nameof(SourceId))]
public class Page
{
    public string Url { get; set; }

    public string Title { get; set; }

    public string Body { get; set; }

    [String(Name="meta_description")]
    public string MetaDescription { get; set; }

    public IList<string> Keywords { get; set; }

    public string Category { get; set; }

    [Date(Name="last_updated_date")]
    public DateTimeOffset LastUpdatedDate { get; set; }

    [String(Name="source_id", Index=FieldIndexOption.NotAnalyzed)]
    public string SourceId { get; set; }
}

With these in place, we would need to recreate the index as before so that these changes are reflected in the mapping and NEST can use this configuration when indexing a Page instance.

Now, back to updates :) We can get a document from Elasticsearch, update it in the application and then re-index it

var getResponse = client.Get<Page>("1");

var page = getResponse.Source;

// update the last updated date 
page.LastUpdatedDate = DateTime.UtcNow;

var updateResponse = client.Update<Page>(page, u => u.Doc(page));

The first argument is the id for the document we want to get which can be inferred by NEST from the Page instance. Since we are passing the entire document back here, we could have just used .Index() instead of Update(), since we are updating all the fields

var indexResponse = client.Index(page);

However, since we only want to update the LastUpdatedDate, having to fetch the document from Elasticsearch, update it in the application, then send the document back to Elasticsearch is a lot of work. We can just send only the updated LastUpdatedDate to Elasticsearch instead using a partial document. C# anonymous types are really useful here

// model our partial document with an anonymous type. 
// Note that we need to use the snake casing name
// (NEST will still camel case the property names but this
//  doesn't help us here)
var lastUpdatedDate = new
{
    last_updated_date = DateTime.UtcNow
};

// do the partial update. 
// Page is TDocument, object is TPartialDocument
var partialUpdateResponse = client.Update<Page, object>("1", u => u
    .Doc(lastUpdatedDate)
);

We can use optimistic concurrency control here if we need to using RetryOnConflict(int)

var partialUpdateResponse = client.Update<Page, object>("1", u => u
    .Doc(lastUpdatedDate)
    .RetryOnConflict(1)
);

With a partial update, Elasticsearch will get the document, apply the partial update and then index the updated document; if the document changes between getting and updating, Elasticsearch is going to retry this once more based on RetryOnConflict(1).

Hope that helps :)