I have a message that flows through several systems, each system logs message entry and exit with a timestamp and a uuid messageId. I'm ingesting all logs through:
filebeat --> logstash --> elastic search --> kibana
As a result I now have these events:
@timestamp messageId event
May 19th 2016, 02:55:29.003 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Enter
May 19th 2016, 02:55:29.200 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Exit
May 19th 2016, 02:55:29.205 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Enter
May 19th 2016, 02:55:29.453 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Exit
I would like to produce a report (ideally a stacked bar or column) of time spent in each system:
messageId in1:1->2:in2
00e02f2f-32d5-9509-870a-f80e54dc8775 197:5:248
What is the best way to do this? Logstash filters? kibana calculated fields?
You can achieve this with the Logstash aggregate
filter only, however, you'd have to substantially re-implement what the elapsed
filter already does, so that'd be a shame, right?
Let's then use a mix of the Logstash aggregate
filter and the elapsed
filter. The latter is used to measure the time of each stage and the former is used to aggregate all the timing information into the last event.
Side note: you might want to rethink your timestamp format to make it something more standard for parsing. I've transformed them to ISO 8601 to make it easier to parse, but feel free to roll your own regex.
So I'm starting from the following logs:
2016-05-19T02:55:29.003 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Enter
2016-05-19T02:55:29.200 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Exit
2016-05-19T02:55:29.205 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Enter
2016-05-19T02:55:29.453 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Exit
First I'm using three elapsed
filters (one for each stage in1
, 1->2
and in2
) and then three aggregate filters in order to gather all the timing information. It looks like this:
filter {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{UUID:messageId} %{WORD:event}"]
add_tag => [ "%{event}" ]
}
date {
match => [ "timestamp", "ISO8601"]
}
# Measures the execution time of system1
elapsed {
unique_id_field => "messageId"
start_tag => "system1Enter"
end_tag => "system1Exit"
new_event_on_match => true
add_tag => ["in1"]
}
# Measures the execution time of system2
elapsed {
unique_id_field => "messageId"
start_tag => "system2Enter"
end_tag => "system2Exit"
new_event_on_match => true
add_tag => ["in2"]
}
# Measures the time between system1 and system2
elapsed {
unique_id_field => "messageId"
start_tag => "system1Exit"
end_tag => "system2Enter"
new_event_on_match => true
add_tag => ["1->2"]
}
# Records the execution time of system1
if "in1" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] = [(event['elapsed_time']*1000).to_i]"
map_action => "create"
}
}
# Records the time between system1 and system2
if "1->2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event['elapsed_time']*1000).to_i"
map_action => "update"
}
}
# Records the execution time of system2
if "in2" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{messageId}"
code => "map['report'] << (event['elapsed_time']*1000).to_i; event['report'] = map['report'].join(':')"
map_action => "update"
end_of_task => true
}
}
}
After the first two events, you'll get a new event like this, which shows that 197ms have been spent in system1:
{
"@timestamp" => "2016-05-21T04:20:51.731Z",
"tags" => [ "elapsed", "elapsed_match", "in1" ],
"elapsed_time" => 0.197,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.003Z"
}
After the third event, you'll get an event like this, which shows how much time is spent between system1 and system2, i.e. 5ms:
{
"@timestamp" => "2016-05-21T04:20:51.734Z",
"tags" => [ "elapsed", "elapsed_match", "1->2" ],
"elapsed_time" => 0.005,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.200Z"
}
After the fourth event, you'll get a new event like this one, which shows how much time was spent in system2, i.e. 248ms. That event also contains a report
field with all the timing information of the message
{
"@timestamp" => "2016-05-21T04:20:51.736Z",
"tags" => [ "elapsed", "elapsed_match", "in2" ],
"elapsed_time" => 0.248,
"messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
"elapsed_timestamp_start" => "2016-05-19T00:55:29.205Z"
"report" => "197:5:248"
}