Elasticsearch Aggregation by Day of Week and Hour of Day

Sambhav Sharma picture Sambhav Sharma · Mar 11, 2015 · Viewed 11.2k times · Source

I have documents of type:

[{"msg":"hello", date: "some-date"},{"msg":"hi!", date: "some-date"}, ...

I want to have the count of documents by day of week. For example x messages were sent on Monday and y were sent on Tuesday and so on.

I have used date_histogram with aggregation but it returns me the documents day wise. It does return me the day, but say "Wed, 22" and "Wed, 29" are returned as separate aggregation documents.

This is somewhat related to Elasticsearch - group by day of week and hour but there is no answer to that question so I am reposting it. According to the suggestion there it asks me to do term aggregation on key_as_string, but I need to add doc_count for every object instead of just count the terms. I also don't know how to use key_as_string in the nested aggregation.

This is what I have tried:

"aggs" : {
                "posts_over_days" : {
                    "date_histogram" : { 
                        "field" : "created_time", 
                        "interval": "day",
                        "format": "E" 
                    }
                }

Answer

RichS picture RichS · Aug 6, 2015

Re-post from my answer here: https://stackoverflow.com/a/31851896/6247

Does this help:

"aggregations": {
    "timeslice": {
        "histogram": {
            "script": "doc['timestamp'].value.getHourOfDay()",
            "interval": 1,
            "min_doc_count": 0,
            "extended_bounds": {
                "min": 0,
                "max": 23
            },
            "order": {
                "_key": "desc"
            }
        }
    }

This is nice, as it'll also include any hours with zero results, and, it'll extend the results to cover the entire 24 hour period (due to the extended_bounds).

You can use 'getDayOfWeek', 'getHourOfDay', ... (see 'Joda time' for more).

This is great for hours, but for days/months it'll give you a number rather than the month name. To work around, you can get the timeslot as a string - but, this won't work with the extended bounds approach, so you may have empty results (i.e. [Mon, Tues, Fri, Sun]).

In-case you want that, it is here:

"aggregations": {
    "dayOfWeek": {
        "terms": {
            "script": "doc['timestamp'].value.getDayOfWeek().getAsText()",
            "order": {
                "_term": "asc"
            }
        }
    }

Even if this doesn't help you, hopefully someone else will find it and benefit from it.