Filtering log files in Flume using interceptors

Shimon Benattar picture Shimon Benattar · Jul 22, 2013 · Viewed 10.9k times · Source

I have an http server writing log files which I then load into HDFS using Flume First I want to filter data according to data I have in my header or body. I read that I can do this using an interceptor with regex, can someone explain exactly what I need to do? Do I need to write Java code that overrides the Flume code?

Also I would like to take data and according to the header send it to a different sink (i.e source=1 goes to sink1 and source=2 goes to sink2) how is this done?

thank you,

Shimon

Answer

Dmitry picture Dmitry · Jul 22, 2013

You don't need to write Java code to filter events. Use Regex Filtering Interceptor to filter events which body text matches some regular expression:

agent.sources.logs_source.interceptors = regex_filter_interceptor
agent.sources.logs_source.interceptors.regex_filter_interceptor.type = regex_filter
agent.sources.logs_source.interceptors.regex_filter_interceptor.regex = <your regex>
agent.sources.logs_source.interceptors.regex_filter_interceptor.excludeEvents = true

To route events based on headers use Multiplexing Channel Selector:

a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4

Here events with header "state"="CZ" go to channel "c1", with "state"="US" - to "c2" and "c3", all other - to "c4".

This way you can also filter events by header - just route specific header value to channel, which points to Null Sink.