I am using Filebeat to parse XML files in Windows, and sending them to Logstash for filtering and sending to Elasticsearch.
The Filebeat job worked perfectly and I m getting XML blocks into Logstash, but it looks likes I misconfigured Logstash filter to parse XML blocks into separated fields and encapsulating these fields into an Elasticsearch type.
Here is my XML sample data:
<H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket> <H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket>> <H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket>
And here is my logstash config file:
input {
beats {
port => 5044
}
}
filter
{
xml
{
source => "ticket"
xpath =>
[
"/ticket/IDH_Ticket/text()", "ticketId",
"/ticket/CodeBus/text()", "codeBus",
"/ticket/CodeCh/text()", "codeCh",
"/ticket/CodeConv/text()", "codeConv",
"/ticket/Codeligne/text()", "codeLigne",
"/ticket/Date/text()", "date",
"/ticket/Heur/text()", "heure",
"/ticket/NomFR1/text()", "nomFR1",
"/ticket/NomAR1/text()", "nomAR1",
"/ticket/NomFR2/text()", "nomFR2",
"/ticket/NomAR2/text()", "nomAR2",
"/ticket/Prix/text()", "prix",
"/ticket/IDTicket/text()", "idTicket",
"/ticket/CodeRoute/text()", "codeRoute",
"/ticket/origine/text()", "origine",
"/ticket/Distination/text()", "destination",
"/ticket/Num/text()", "num",
"/ticket/Ligne/text()", "ligne",
"/ticket/requisition/text()", "requisition",
"/ticket/voyage/text()", "voyage",
"/ticket/faveur/text()", "faveur"
]
store_xml => true
target => "doc"
}
}
output
{
elasticsearch
{
hosts => "localhost"
index => "buses"
document_type => "ticket"
}
file {
path => "C:\busesdata\logstash.log"
}
stdout { codec =>rubydebug}
}
Filebeat configuration:
filebeat:
# List of prospectors to fetch data.
prospectors:
paths:
- C:\busesdata\*.xml
input_type: log
document_type: ticket
scan_frequency: 10s
multiline:
pattern: '<H_Ticket'
negate: true
match: after
output:
### Logstash as output
logstash:
hosts: ["localhost:5044"]
index: filebeat
And here is a portion of both stdout and file output:
PS C:\logstash-2.3.3\bin> .\logstash -f .\logstash_temp.conf
io/console not supported; tty will not be manipulated
Settings: Default pipeline workers: 4
Pipeline main started
{
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<?xml-stylesheet href=\"ticket.xsl\" type=\"text/xsl\"?>\n<HF_DOCUMENT>",
"@version" => "1",
"@timestamp" => "2016-07-03T12:13:28.892Z",
"source" => "C:\\busesdata\\ticket2.xml",
"type" => "ticket",
"input_type" => "log",
"fields" => nil,
"beat" => {
"hostname" => "hp-pavillion-g6",
"name" => "hp-pavillion-g6"
},
"offset" => 0,
"count" => 1,
"host" => "hp-pavillion-g6",
"tags" => [
[0] "beats_input_codec_plain_applied"
]
}
{
"message" => "\t<H_Ticket>\r\n\t\t<IDH_Ticket>1</IDH_Ticket>\r\n\t\t<CodeBus>186</CodeBus>\r\n\t\t<CodeCh>5531</CodeCh>\r\n\t\t<CodeConv>5531</CodeConv>\r\n\t\t<Codeligne>12</Codeligne>\r\n\t\t<Date>20150903</Date>\r\n\t\t<Heur>1101</Heur>\r\n\t\t<NomFR1>SOUK AHAD</NomFR1>\r\n\t\t<NomAR1>??? ?????</NomAR1>\r\n\t\t<NomFR2>SOVIVA </NomFR2>\r\n\t\t<NomAR2>??????</NomAR2>\r\n\t\t<Prix>0.66</Prix>\r\n\t\t<IDTicket>1</IDTicket>\r\n\t\t<CodeRoute>107</CodeRoute>\r\n\t\t<origine>01</origine>\r\n\t\t<Distination>07</Distination>\r\n\t\t<Num>3</Num>\r\n\t\t<Ligne>107</Ligne>\r\n\t\t<requisition> </requisition>\r\n\t\t<voyage>0</voyage>\r\n\t\t<faveur> </faveur>\r\n\t</H_Ticket>",
"@version" => "1",
"@timestamp" => "2016-07-03T12:13:28.892Z",
"input_type" => "log",
"source" => "C:\\busesdata\\ticket2.xml",
"offset" => 125,
"type" => "ticket",
"count" => 1,
"fields" => nil,
"beat" => {
"hostname" => "hp-pavillion-g6",
"name" => "hp-pavillion-g6"
},
"host" => "hp-pavillion-g6",
"tags" => [
[0] "beats_input_codec_plain_applied"
]
}
Can you try editing the xpath
configuration in the filter
as below:
filter
{
xml
{
source => "ticket"
xpath =>
[
"/IDH_Ticket/text()", "ticketId",
"/CodeBus/text()", "codeBus",
"/CodeCh/text()", "codeCh",
"/CodeConv/text()", "codeConv",
"/Codeligne/text()", "codeLigne",
"/Date/text()", "date",
"/Heur/text()", "heure",
"/NomFR1/text()", "nomFR1",
"/NomAR1/text()", "nomAR1",
"/NomFR2/text()", "nomFR2",
"/NomAR2/text()", "nomAR2",
"/Prix/text()", "prix",
"/IDTicket/text()", "idTicket",
"/CodeRoute/text()", "codeRoute",
"/origine/text()", "origine",
"/Distination/text()", "destination",
"/Num/text()", "num",
"/Ligne/text()", "ligne",
"/requisition/text()", "requisition",
"/voyage/text()", "voyage",
"/faveur/text()", "faveur"
]
store_xml => true
target => "doc"
}
}