Parsing XML file using Logstash

KARAN SHAH picture KARAN SHAH · Feb 9, 2018 · Viewed 10.5k times · Source

I am trying to parse an XML file in Logstash. I want to use XPath to do the parsing of documents in XML. So when I run my config file the data loads into elasticsearch but It is not in the way I want to load the data. The data loaded in elasticsearch is each line in xml document

Structure of my XML file

enter image description here

What I want to achieve:

create fields in elasticsearch that stores the follwing

ID =1
Name = "Finch"

My Config file:

input{
    file{
        path => "C:\Users\186181152\Downloads\stations.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
    }
}
filter{
    xml{
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output{
    elasticsearch{
        codec => json
        hosts => "localhost"
        index => "xmlns"
    }
    stdout{
        codec => rubydebug
    }
}

Output in Logstash:

{
    "station_name" => "%{station_name}",
    "path" => "C:\Users\186181152\Downloads\stations.xml",
    "@timestamp" => 2018-02-09T04:03:12.908Z,
    "station_id" => "%{station_id}",
    "@version" => "1",
    "host" => "BW",
    "message" => "\t\r",
    "type" => "xml"
}

Answer

KARAN SHAH picture KARAN SHAH · Feb 12, 2018

The multiline filter allows to create xml file as a single event and we can use xml-filter or xpath to parse the xml to ingest data in elasticsearch. In the multiline filter, we mention a pattern( in below example) that is used by logstash to scan your xml file. Once the pattern matches all the entries after that will be considered as a single event.

The following is an example of working config file for my data

input {
    file {
        path => "C:\Users\186181152\Downloads\stations3.xml"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        exclude => "*.gz"
        type => "xml"
        codec => multiline {
            pattern => "<stations>" 
            negate => "true"
            what => "previous"
        }
    }
}

filter {
    xml {
        source => "message"
        store_xml => false
        target => "stations"
        xpath => [
            "/stations/station/id/text()", "station_id",
            "/stations/station/name/text()", "station_name"
        ]
    }
}

output {
    elasticsearch {
        codec => json
        hosts => "localhost"
        index => "xmlns24"
    }
    stdout {
        codec => rubydebug
    }
}