Calculating percentiles in SQL

user518206 picture user518206 · Jul 26, 2018 · Viewed 10.4k times · Source

This should be very straightforward, but as a newbie to SQL I am really struggling. I've been recommended to use PERCENTILE_CONT with continuous (non-discrete) data.

The data in question concerns two columns: (1) the IDs for a list of patients and (2) the average number of events per year.

Working from some code I found online, this is where I am up to

SELECT ID,
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_75
FROM AE_COUNT;

This just seems to report thee columns each with identical values of PPPY.

Any idea where I'm going wrong?

Answer

Robert Corell picture Robert Corell · Jul 26, 2018

Assuming that you want to get the percentiles for the whole table, try this:

SELECT Distinct
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_75
FROM AE_COUNT;

Removing the partition statement will run it against the whole table. I also removed the Id column form the select statement and made it distinct.

I would also like to point out that you said the second column is the average number of events per year. I have no idea what you need the percentiles for, but be aware that calculating a percentile of the averages of a group of sets will not yield the same result as calculating the percentiles of a union of the sets.