Merging multiple log files by date including multilines

Marco Behler picture Marco Behler · Apr 7, 2013 · Viewed 11.2k times · Source

I have several logs containing lines all starting with a timestamp, so that the following works as expected to merge them:

cat myLog1.txt myLog2.txt | sort -n > combined.txt

Problem is, that myLog2.txt can also contain lines without a timestamp (e.g. java stack traces). Is there an easy way without any custom scripts to still merge them and preserve the multiline content?

Example myLog1.txt

11:48:18.825 [main] INFO  org.hibernate.cfg.Environment - HHH000206: hibernate.properties not found
11:48:55.784 [main] INFO  o.h.tool.hbm2ddl.SchemaUpdate - HHH000396: Updating schema

Example myLog2.txt

11:48:35.377 [qtp1484319352-19] ERROR c.w.b.c.ControllerErrorHandler -
org.springframework.beans.TypeMismatchException: Failed to convert value of type   'java.lang.String' to required type 'org.joda.time.LocalDate'; nested exception is    org.springframework.core.convert.ConversionFailedException: Failed to convert from type     java.lang.String to type @org.springframework.web.bind.annotation.RequestParam   @org.springframework.format.annotation.DateTimeFormat org.joda.time.LocalDate for value    '[2013-03-26]'; nested exception is java.lang.IllegalArgumentException: Invalid format: "    [2013-03-26]"
    at org.springframework.beans.TypeConverterSupport.doConvert(TypeConverterSupport.java:68) ~[spring-beans-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.beans.TypeConverterSupport.convertIfNecessary(TypeConverterSupport.java:45) ~[spring-beans-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.validation.DataBinder.convertIfNecessary(DataBinder.java:595) ~[spring-context-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.annotation.AbstractNamedValueMethodArgumentResolver.resolveArgument(AbstractNamedValueMethodArgumentResolver.java:98) ~[spring-web-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.support.HandlerMethodArgumentResolverComposite.resolveArgument(HandlerMethodArgumentResolverComposite.java:77) ~[spring-web-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.support.InvocableHandlerMethod.getMethodArgumentValues(InvocableHandlerMethod.java:162) ~[spring-web-3.2.1.RELEAS

Expected output

11:48:18.825 [main] INFO  org.hibernate.cfg.Environment - HHH000206: hibernate.properties not found
11:48:35.377 [qtp1484319352-19] ERROR c.w.b.c.ControllerErrorHandler -
org.springframework.beans.TypeMismatchException: Failed to convert value of type   'java.lang.String' to required type 'org.joda.time.LocalDate'; nested exception is    org.springframework.core.convert.ConversionFailedException: Failed to convert from type     java.lang.String to type @org.springframework.web.bind.annotation.RequestParam   @org.springframework.format.annotation.DateTimeFormat org.joda.time.LocalDate for value    '[2013-03-26]'; nested exception is java.lang.IllegalArgumentException: Invalid format: "    [2013-03-26]"
    at org.springframework.beans.TypeConverterSupport.doConvert(TypeConverterSupport.java:68) ~[spring-beans-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.beans.TypeConverterSupport.convertIfNecessary(TypeConverterSupport.java:45) ~[spring-beans-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.validation.DataBinder.convertIfNecessary(DataBinder.java:595) ~[spring-context-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.annotation.AbstractNamedValueMethodArgumentResolver.resolveArgument(AbstractNamedValueMethodArgumentResolver.java:98) ~[spring-web-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.support.HandlerMethodArgumentResolverComposite.resolveArgument(HandlerMethodArgumentResolverComposite.java:77) ~[spring-web-3.2.1.RELEASE.jar:3.2.1.RELEASE]
at org.springframework.web.method.support.InvocableHandlerMethod.getMethodArgumentValues(InvocableHandlerMethod.java:162) ~[spring-web-3.2.1.RELEAS
11:48:55.784 [main] INFO  o.h.tool.hbm2ddl.SchemaUpdate - HHH000396: Updating schema

Thanks Marco

Answer

topr picture topr · Nov 26, 2014

I was struggling with the same issue and finally I think I've got it. Try do it like:

sort -nbms -k1.1,1.2 -k1.4,1.5 -k1.7,1.8 -k1.10,1.12 myLog1.txt myLog2.txt > combined.txt

It's still not fully clear to myself, I'll try to give some explanation though. According to the man pages used switches mean:

-n, --numeric-sort - compare according to string numerical value.

-b, --ignore-leading-blanks - ignore leading blanks.

-s, --stable - stabilize sort by disabling last-resort comparison

-m, --merge - merge already sorted files; do not sort

-k, --key=POS1[,POS2] - start a key at POS1 (origin 1), end it at POS2 (default end of line)

  • log files are already ordered so we don't need to sort them again, only determine which line goes where upon merging. That's why -m. It's crucial to keep stacktraces from getting scrambled.
  • -b is not necessary in this case as somehow -n and -m combined keeps stacktrace lines from getting clustered. I left it just in case as most of stacktrace lines starts with blanks.
  • -n apparently stops comparing key whenever there is a non-numeric character in the key. That's the second crucial bit for keeping stacktraces in place. Important is if it was -n -k1,1 it would only sort the log files by hour as colon is non-numeric. Apart from that -n speeds up numeric comparison so we would like to have it anyway.
  • the problem mentioned in the previous point is solved by pointing to specific characters positions in each key, that's why -k1.1,1.2 (first and second digit of hour) -k1.4,1.5 (first and second digit of minutes) and so on. The first digit before the dot is always '1' as it points to the first column of the file line (which in our case is time). Shortly it's -kA,B where A and B are column positions in a given line (by default lines are delimited by blanks). Format of A and B used is .. Keep in mind that whenever there is a non-numeric character between A and B everything after it will be ignored in comparison if -n used.
  • -s disables default behaviour which is: whenever keys by which comparison is being done are the same full string comparison of the lines is done. We don't want that to preserve original log entries order. Not sure if it's necessary with -m though.