I need to calculate median value of a numeric sequence in Google BigQuery efficiently. Is the same possible?
Yeah it's possible with PERCENTILE_CONT window function.
Returns values that are based upon linear interpolation between the values of the group, after ordering them per the ORDER BY clause.
must be between 0 and 1.
This window function requires ORDER BY in the OVER clause.
So an example query would be like (the max() is there just to work across the group by but it's not being used as a math logic, should not confuse you)
SELECT room,
max(median) FROM (SELECT room,
percentile_cont(0.5) OVER (PARTITION BY room
ORDER BY temperature) AS median FROM
(SELECT 1 AS room,
11 AS temperature),
(SELECT 1 AS room,
12 AS temperature),
(SELECT 1 AS room,
14 AS temperature),
(SELECT 1 AS room,
19 AS temperature),
(SELECT 1 AS room,
13 AS temperature),
(SELECT 2 AS room,
20 AS temperature),
(SELECT 2 AS room,
21 AS temperature),
(SELECT 2 AS room,
29 AS temperature),
(SELECT 3 AS room,
30 AS temperature)) GROUP BY room
This returns:
+------+-------------+
| room | temperature |
+------+-------------+
| 1 | 13 |
| 2 | 21 |
| 3 | 30 |
+------+-------------+