I have the following MySQL query which is working perfectly:
select
count(*) as `# of Data points`,
name,
max((QNTY_Sell/QNTYDelivered)*1000) as `MAX Thousand Price`,
min((QNTY_Sell/QNTYDelivered)*1000) as `MIN Thousand Price`,
avg((QNTY_Sell/QNTYDelivered)*1000) as `MEAN Thousand Price`
from
table_name
where
year(date) >= 2012 and
name like "%the_name%" and
QNTYDelivered > 0 and
QNTY_Sell > 0
group by name
order by name;
Now I wish to also add a result column that gives me the MEDIAN of the data for each line. Under SELECT
this would look like this in a perfect world:
median((QNTY_Sell/QNTYDelivered)*1000) as `MEDIAN Thousand Price`
Searching Google for a MySQL median function brought me to this answer, which seems ok if you are interested in the median of a data set for an entire table: Simple way to calculate median with MySQL
The difference here is that I am grouping the data in my table by the name
column, and want to get the median for each line of the data grouped by this column.
Does anyone know a nifty trick to allow me to do this?
Thanks!
The only way I found to do this is through string manipulation:
with GROUP_CONCAT
a list of all value is created then with indented SUBSTRING_INDEX
the median value is taken
SELECT
count(*) AS `# of Data points`,
name,
max((QNTY_Sell/QNTYDelivered)*1000) AS `MAX Thousand Price`,
min((QNTY_Sell/QNTYDelivered)*1000) AS `MIN Thousand Price`,
avg((QNTY_Sell/QNTYDelivered)*1000) AS `MEAN Thousand Price`
, CASE (count(*) % 2)
WHEN 1 THEN SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', (count(*) + 1) / 2)
, ',', -1)
ELSE (SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', count(*) / 2)
, ',', -1)
+ SUBSTRING_INDEX(
SUBSTRING_INDEX(
group_concat((QNTY_Sell/QNTYDelivered)*1000
ORDER BY (QNTY_Sell/QNTYDelivered)*1000 SEPARATOR ',')
, ',', (count(*) + 1) / 2)
, ',', -1)) / 2
END median
FROM
sales
WHERE
year(date) >= 2012 AND
name LIKE "%art.%" AND
QNTYDelivered > 0 AND
QNTY_Sell > 0
GROUP BY name
ORDER BY name;
The CASE is needed to check if we have a single median value, with an odd number of values, or two median values, with an even number of values, in the second case the median is the mean of the two values founded.