Calculating percentile rank in MySQL

lhahne picture lhahne · Jun 29, 2009 · Viewed 38.3k times · Source

I have a very big table of measurement data in MySQL and I need to compute the percentile rank for each and every one of these values. Oracle appears to have a function called percent_rank but I can't find anything similar for MySQL. Sure I could just brute-force it in Python which I use anyways to populate the table but I suspect that would be quite inefficient because one sample might have 200.000 observations.

Answer

mattstuehler picture mattstuehler · Oct 25, 2011

Here's a different approach that doesn't require a join. In my case (a table with 15,000+) rows, it runs in about 3 seconds. (The JOIN method takes an order of magnitude longer).

In the sample, assume that measure is the column on which you're calculating the percent rank, and id is just a row identifier (not required):

SELECT
    id,
    @prev := @curr as prev,
    @curr := measure as curr,
    @rank := IF(@prev > @curr, @rank+@ties, @rank) AS rank,
    @ties := IF(@prev = @curr, @ties+1, 1) AS ties,
    (1-@rank/@total) as percentrank
FROM
    mytable,
    (SELECT
        @curr := null,
        @prev := null,
        @rank := 0,
        @ties := 1,
        @total := count(*) from mytable where measure is not null
    ) b
WHERE
    measure is not null
ORDER BY
    measure DESC

Credit for this method goes to Shlomi Noach. He writes about it in detail here:

http://code.openark.org/blog/mysql/sql-ranking-without-self-join

I've tested this in MySQL and it works great; no idea about Oracle, SQLServer, etc.