Logstash grok filter - name fields dynamically

Cipher picture Cipher · Jul 7, 2014 · Viewed 10.9k times · Source

I've got log lines in the following format and want to extract fields:

[field1: content1] [field2: content2] [field3: content3] ...

I neither know the field names, nor the number of fields.

I tried it with backreferences and the sprintf format but got no results:

match => [ "message", "(?:\[(\w+): %{DATA:\k<-1>}\])+" ] # not working
match => [ "message", "(?:\[%{WORD:fieldname}: %{DATA:%{fieldname}}\])+" ] # not working

This seems to work for only one field but not more:

match => [ "message", "(?:\[%{WORD:field}: %{DATA:content}\] ?)+" ]
add_field => { "%{field}" => "%{content}" }

The kv filter is also not appropriate because the content of the fields may contain whitespaces.

Is there any plugin / strategy to fix this problem?

Answer

Ben Lim picture Ben Lim · Jul 7, 2014

Logstash Ruby Plugin can help you. :)

Here is the configuration:

input {
    stdin {}
}

filter {
    ruby {
        code => "
            fieldArray = event['message'].split('] [')
            for field in fieldArray
                field = field.delete '['
                field = field.delete ']'
                result = field.split(': ')
                event[result[0]] = result[1]
            end
        "
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

With your logs:

[field1: content1] [field2: content2] [field3: content3]

This is the output:

{
   "message" => "[field1: content1] [field2: content2] [field3: content3]",
  "@version" => "1",
"@timestamp" => "2014-07-07T08:49:28.543Z",
      "host" => "abc",
    "field1" => "content1",
    "field2" => "content2",
    "field3" => "content3"
}

I have try with 4 fields, it also works.

Please note that the event in the ruby code is logstash event. You can use it to get all your event field such as message, @timestamp etc.

Enjoy it!!!