I am parsing a set of data into an ELK stack for some non-tech folks to view. As part of this, I want to remove all fields except a specific known subset of fields from the events before sending into ElasticSearch.
I can explicitly specify each field to drop in a mutate filter like so:
filter {
mutate {
remove_field => [ "throw_away_field1", "throw_away_field2" ]
}
}
In this case, anytime a new field gets added to the input data (which can happen often since the data is pulled from a queue and used by multiple systems for multiple purposes) it would require an update to the filtering, which is extra overhead that's not needed. Not to mention if some sensitive data made it through between when the input streams were updated and when the filtering was updated, that could be bad.
Is there a way using the logstash filter to iterate over each field of an object, and remove_field if it is not in a provided list of field names? Or would I have to write a custom filter to do this? Basically, for every single object, I just want to keep 8 specific fields, and toss absolutely everything else.
It looks like very minimal if ![field] =~ /^value$/
type logic is available in the logstash.conf file, but I don't see any examples that would iterate over the fields themselves in a for each
style and compare the field name to a list of values.
Answer:
After upgrading logstash to 1.5.0 to be able to use plugin extensions such as prune, the solution ended up looking like this:
filter {
prune {
interpolate => true
whitelist_names => ["fieldtokeep1","fieldtokeep2"]
}
}
Prune whitelist should be what you're looking for.
For more specific control, dropping to the ruby filter is probably the next step.