logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

tedder42 picture tedder42 · May 14, 2015 · Viewed 10.4k times · Source

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:

output {
        elasticsearch_http {
            host => "127.0.0.1"
            document_id => "%{document_id}"
        }
}

I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.

output {
        elasticsearch_http {
            host => "127.0.0.1"
            if document_id {
                document_id => "%{document_id}"
            } 
        }
}

Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
        elasticsearch_http {
    host => "127.0.0.1"
    if 

I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:

if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {

Answer

Magnus B&#228;ck picture Magnus Bäck · May 14, 2015

You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:

output {
  if [document_id] {
    elasticsearch_http {
      host => "127.0.0.1"
      document_id => "%{document_id}"
    } 
  } else {
    elasticsearch_http {
      host => "127.0.0.1"
    } 
  }
}

(But the suggestion in one of the other answers to use the uuid filter is good too.)