This should be very straightforward, but as a newbie to SQL I am really struggling. I've been recommended to use PERCENTILE_CONT with continuous (non-discrete) data.
The data in question concerns two columns: (1) the IDs for a list of patients and (2) the average number of events per year.
Working from some code I found online, this is where I am up to
SELECT ID,
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_75
FROM AE_COUNT;
This just seems to report thee columns each with identical values of PPPY.
Any idea where I'm going wrong?
Assuming that you want to get the percentiles for the whole table, try this:
SELECT Distinct
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_75
FROM AE_COUNT;
Removing the partition statement will run it against the whole table. I also removed the Id column form the select statement and made it distinct.
I would also like to point out that you said the second column is the average number of events per year. I have no idea what you need the percentiles for, but be aware that calculating a percentile of the averages of a group of sets will not yield the same result as calculating the percentiles of a union of the sets.