I am trying to parse an XML file in Logstash. I want to use XPath to do the parsing of documents in XML. So when I run my config file the data loads into elasticsearch
but It is not in the way I want to load the data. The data loaded in elasticsearch
is each line in xml document
Structure of my XML file
What I want to achieve:
create fields in elasticsearch that stores the follwing
ID =1
Name = "Finch"
My Config file:
input{
file{
path => "C:\Users\186181152\Downloads\stations.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
type => "xml"
}
}
filter{
xml{
source => "message"
store_xml => false
target => "stations"
xpath => [
"/stations/station/id/text()", "station_id",
"/stations/station/name/text()", "station_name"
]
}
}
output{
elasticsearch{
codec => json
hosts => "localhost"
index => "xmlns"
}
stdout{
codec => rubydebug
}
}
Output in Logstash:
{
"station_name" => "%{station_name}",
"path" => "C:\Users\186181152\Downloads\stations.xml",
"@timestamp" => 2018-02-09T04:03:12.908Z,
"station_id" => "%{station_id}",
"@version" => "1",
"host" => "BW",
"message" => "\t\r",
"type" => "xml"
}
The multiline filter allows to create xml file as a single event and we can use xml-filter or xpath to parse the xml to ingest data in elasticsearch. In the multiline filter, we mention a pattern( in below example) that is used by logstash to scan your xml file. Once the pattern matches all the entries after that will be considered as a single event.
The following is an example of working config file for my data
input {
file {
path => "C:\Users\186181152\Downloads\stations3.xml"
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
type => "xml"
codec => multiline {
pattern => "<stations>"
negate => "true"
what => "previous"
}
}
}
filter {
xml {
source => "message"
store_xml => false
target => "stations"
xpath => [
"/stations/station/id/text()", "station_id",
"/stations/station/name/text()", "station_name"
]
}
}
output {
elasticsearch {
codec => json
hosts => "localhost"
index => "xmlns24"
}
stdout {
codec => rubydebug
}
}