Selective parsing of csv file using logstash

Sagnik Sinha picture Sagnik Sinha · Dec 17, 2014 · Viewed 7.3k times · Source

I am trying to feed data into elasticsearch from csv files, through logstash. These csv files contain the first row as the column names. Is there any particular way to skip that row while parsing the file? Are there any conditionals/filters that I could use such that in case of exception it would skip to the next row??

my config file looks like:

input {  
      file {
          path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
          type => "promosms_dec15"
          start_position => "beginning"
          sincedb_path => "/dev/null"
      }
}
filter {

    csv {
        columns => ["Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"]
        separator => ","
    }  
    ruby {
          code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
    }

}
output {  
    elasticsearch { 
        action => "index"
        host => "localhost"
        index => "promosms-%{+dd.MM.YYYY}"
        workers => 1
    }
}

The first few rows of my csv file looks like

"Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"
"","No","FMN1191MVHV","31/03/2014"
"","No","FMN1191N64G","31/03/2014"
"","No","FMN1192OPMY","31/03/2014"

Is there anyway I could skip the first line? Also, if my csv file ends with a new line, with nothing in it, then also I get an error. How do I skip those new lines if they come at the end of the file or if thre is an empty row between 2 rows?

Answer

Rumbles picture Rumbles · Dec 17, 2014

A simple way to do it would be to add the following to your filter (after csv, before ruby):

if [Comm_Plan] == "Comm_Plan" {
  drop { }
}

Assuming the field would never normally have the same value as the column heading, it should work as expected, however, you could be more specific by using:

if [Comm_Plan] == "Comm_Plan" and [Queue_Booking] == "Queue_Booking" and [Order_Reference] == "Order_Reference" and [Generation_Date] == "Generation_Date" {
  drop { }
}

All this would do would be to check to see if the field value had that particular value and if it did, drop the event.