Split access.log file by dates using command line tools

mr.b picture mr.b · Jul 27, 2012 · Viewed 17.9k times · Source

I have a Apache access.log file, which is around 35GB in size. Grepping through it is not an option any more, without waiting a great deal.

I wanted to split it in many small files, by using date as splitting criteria.

Date is in format [15/Oct/2011:12:02:02 +0000]. Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?

Input file name is access.log. I'd like output files to have format such as access.apache.15_Oct_2011.log (that would do the trick, although not nice when sorting.)

Answer

Theodore R. Smith picture Theodore R. Smith · Jul 30, 2012

One way using awk:

awk 'BEGIN {
    split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
    for (a = 1; a <= 12; a++)
        m[months[a]] = sprintf("%02d", a)
}
{
    split($4,array,"[:/]")
    year = array[3]
    month = m[array[2]]

    print > FILENAME"-"year"_"month".txt"
}' incendiary.ws-2009

This will output files like:

incendiary.ws-2010-2010_04.txt
incendiary.ws-2010-2010_05.txt
incendiary.ws-2010-2010_06.txt
incendiary.ws-2010-2010_07.txt

Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.

Original inspiration: "How to split existing apache logfile by month?"