How to filter clickhouse table by array column contents?

MoshMcCabe picture MoshMcCabe · Dec 1, 2017 · Viewed 8k times · Source

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.

I've created the table using:

CREATE TABLE IF NOT EXISTS ArrayTest (
    date Date,
    sessionSecond UInt16,
    distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);

Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:

Table with sample values

Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:

SELECT arrayExists(
    val -> val > 7,
    arrayEnumerate(distance))
FROM ArrayTest;

Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?

Thanks

Answer

Mikhail picture Mikhail · Dec 1, 2017

You can use arrayExists in the WHERE clause.

SELECT * 
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;

Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:

SELECT d, distance, sessionSecond 
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7