I'm having trouble with calculating the median of a list of values, not the average.
I found this article Simple way to calculate median with MySQL
It has a reference to the following query which I don't understand properly.
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
If I have a time
column and I want to calculate the median value, what do the x
and y
columns refer to?
I propose a faster way.
Get the row count:
SELECT CEIL(COUNT(*)/2) FROM data;
Then take the middle value in a sorted subquery:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
I tested this with a 5x10e6 dataset of random numbers and it will find the median in under 10 seconds.
This will find an arbitrary percentile by replacing the COUNT(*)/2
with COUNT(*)*n
where n
is the percentile (.5 for median, .75 for 75th percentile, etc).