clickhouse - how get count datetime per 1minute or 1day ,

aziminia picture aziminia · Jan 2, 2019 · Viewed 7.5k times · Source

I have a table in Clickhouse. for keep statistics and metrics.
and structure is:

datetime|metric_name|metric_value

I want to keep statistics and limit number of accesses in 1 minute, 1 hour, 1 day and so on. So I need event counts in last minute, hour or day for every metric_name and I want to prepare statistics in a chart.
I do not know how to make a query. I get the count of metrics statistics based on the exact for example 1 minute, 1 hour, 1 day and so on.

I used to work on inflxdb:

SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND time >= now() - 1h GROUP BY time(5m) fill(0)

In fact, I want to get the number of each metric per 5 minutes in the previous 1 hour.

I do not know how to use aggregations for this problem

Answer

Amos picture Amos · Jan 2, 2019

ClickHouse has functions for generating Date/DateTime group buckets such as toStartOfWeek, toStartOfHour, toStartOfFiveMinute. You can also use intDiv function to manually divide value ranges. However the fill feature is still in the roadmap.

For example, you can rewrite the influx sql without the fill in ClickHouse like this,

SELECT SUM(value) FROM `TABLE` WHERE `metric_name`=`metric_value` AND 
time >= now() - 1h GROUP BY toStartOfFiveMinute(time)

You can also refer to this discussion https://github.com/yandex/ClickHouse/issues/379

update

There is a timeSlots function that can help generating empty buckets. Here is a working example

SELECT
    slot,
    metric_value_sum
FROM
(
    SELECT
        toStartOfFiveMinute(datetime) AS slot,
        SUM(metric_value) AS metric_value_sum
    FROM metrics
    WHERE (metric_name = 'k1') AND (datetime >= (now() - toIntervalHour(1)))
    GROUP BY slot
)
ANY RIGHT JOIN
(
    SELECT arrayJoin(timeSlots(now() - toIntervalHour(1), toUInt32(3600), 300)) AS slot
) USING (slot)