Getting Started With Logstash Filters

Chris picture Chris · Dec 17, 2013 · Viewed 25.8k times · Source

Looking for a little help getting started... I have Logstash installed (as well as ElasticSearch) but I'm struggling with my first filter.

As a test I have it configured to read from a trimmed log file that contains 6 lines, each line begins with a time stamp such as [11/5/13 4:09:21:327 PST] followed by a bunch of other data.

For now I have my conf file set to read this file and I'm trying to do a very basic grok filter to match the lines, maybe to grab the timestamp and then the rest of the data (from where I can start splitting it up).

Here is what I have:

input {
  file {
    type => "chris"
    path => "/home/chris/Documents/test.log" 
  }
}
filter {
  grok {
    type => "chris"
    pattern => "%{GREEDYDATA:logline}"
  }
}
output {
  stdout {debug => true debug_format => "json"}
}

I was kind of expecting (hoping) that when I ran Logstash it'd match each line and output it, then I could start breaking the lines down and filtering my adjusting the pattern but as I can't get this first basic bit to work I'm a little stumped.

Does anyone have a similar conf file they'd be okay to share? Most of the examples I can find are more advanced and I seem to be stuck trying to get out of the gate.

Thanks,

Chris.

Answer

stuart-warren picture stuart-warren · Jan 6, 2014

Start off removing the contents of the filter.

The docs for the current version (1.3.2) of logstash grok filter plugin are here http://logstash.net/docs/1.3.2/filters/grok

Ensure you are looking at the correct version of the docs for the version of logstash you have downloaded.

An example Grok filter would be:

filter {
  grok {
    match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]
  } 
 }

But this is unlikely to match your data.

"message" is the default field your entire log ends up in so is likely a good choice for you too.

The next part creates 5 new fields, client, method, request, bytes and duration by reading the logline and matching parts with the predefined Grok patterns, such as IP, WORD, etc. This you'd need to change.

Start off with

filter {
  grok {
    match => [ "message", "%{GREEDYDATA:logline}" ]
  } 
 }

Which will actually just duplicate the message field into a separate logline field, but is somewhere to start. As you add more Grok patterns to the filter the logline field will only contain anything not grokked.

You can test out your Grok patterns here http://grokdebug.herokuapp.com/

You will likely want to use the grok filter to grok out the timestamp into it's own field and then use the date filter to actually use that as the logs timestamp.

filter {
  grok {
    match => [ "message", "%{TIMESTAMP_ISO8601:syslog_timestamp} %{GREEDYDATA:syslog5424_msg}" ]
  }
  date {
    match => [ "syslog_timestamp", "ISO8601" ]
  }
 }

TIMESTAMP_ISO8601 matches timestamps in a very verbose format(http://grokdebug.herokuapp.com/patterns#), this may not work for you.

ISO8601 is the same format prespecified for the date filter, you may need to manually specify your date format here instead. See the docs: http://logstash.net/docs/1.3.2/filters/date