Logstash indexing JSON arrays

JP. picture JP. · Feb 27, 2014 · Viewed 13.1k times · Source

Logstash is awesome. I can send it JSON like this (multi-lined for readability):

{
  "a": "one"
  "b": {
    "alpha":"awesome"
  }
}

And then query for that line in kibana using the search term b.alpha:awesome. Nice.

However I now have a JSON log line like this:

{
  "different":[
    {
      "this": "one",
      "that": "uno"
    },
    {
      "this": "two"
    }
  ]
}

And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)

If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:

different: {this: one, that: uno}, {this: two}

Which isn't going to help me searching for log lines using different.this or different.that.

Any got any thoughts as to a codec, filter or code change I can make to enable this?

Answer

vzamanillo picture vzamanillo · Mar 26, 2014

You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)

You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows

begin
  # TODO(sissel): Note, this will not successfully handle json lists
  # like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
  # which won't merge into a hash. If someone needs this, we can fix it
  # later.
  dest.merge!(JSON.parse(source))

  # If no target, we target the root of the event object. This can allow
  # you to overwrite @timestamp. If so, let's parse it as a timestamp!
  if !@target && event[TIMESTAMP].is_a?(String)
    # This is a hack to help folks who are mucking with @timestamp during
    # their json filter. You aren't supposed to do anything with
    # "@timestamp" outside of the date filter, but nobody listens... ;)
    event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
  end

  filter_matched(event)
rescue => e
  event.tag("_jsonparsefailure")
  @logger.warn("Trouble parsing json", :source => @source,
               :raw => event[@source], :exception => e)
  return
end

You can modify the parsing procedure to modify the original JSON

  json  = JSON.parse(source)
  if json.is_a?(Hash)
    json.each do |key, value| 
        if value.is_a?(Array)
            value.each_with_index do |object, index|
                #modify as you need
                object["index"]=index
            end
        end
    end
  end
  #save modified json
  ......
  dest.merge!(json)

then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config

This is mine elastic_with_json.conf with a modified json.rb filter

input{
    stdin{

    }
}filter{
    json{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

if you want to use your new filter you can configure it with the config_name

class LogStash::Filters::Json_index < LogStash::Filters::Base

  config_name "json_index"
  milestone 2
  ....
end

and configure it

input{
    stdin{

    }
}filter{
    json_index{
        source => "message"
    }
}output{
    elasticsearch{
        host=>localhost
    }stdout{

    }
}

Hope this helps.