Average of latest N records per group

JV- picture JV- · Jun 5, 2013 · Viewed 10.1k times · Source

My current application calculates a point average based on all records for each user:

SELECT `user_id`, AVG(`points`) AS pts 
FROM `players` 
WHERE `points` != 0 
GROUP BY `user_id`

The business requirement has changed and I need to calculate the average based on the last 30 records for each user.

The relevant tables have the following structure:

table: players; columns: player_id, user_id, match_id, points

table: users; columns: user_id

The following query does not work, but it does demonstrate the logic that I am trying to implement.

SELECT @user_id := u.`id`, (
    -- Calculate the average for last 30 records
    SELECT AVG(plr.`points`) 
    FROM (
        -- Select the last 30 records for evaluation
        SELECT p.`points` 
        FROM `players` AS p 
        WHERE p.`user_id`=@user_id 
        ORDER BY `match_id` DESC 
        LIMIT 30
    ) AS plr
) AS avg_points 
FROM `users` AS u

Is there a fairly efficient way to calculate the averages based on the latest 30 records for each user?

Answer

TMS picture TMS · Jun 8, 2013

There is no reason to reinvent a wheel and risk you have a buggy, suboptimal code. Your problem is trivial extension of common per group limit problem. There are already tested and optimized solutions to solve this problem, and from this resource I would recommend to choose from following two solutions. These queries produce latest 30 records for each player (rewritten for your tables):

select user_id, points
from players
where (
   select count(*) from players as p
   where p.user_id = players.user_id and p.player_id >= players.player_id
) <= 30;

(Just to make sure I understand your structure: I suppose player_id is a unique key in players table and that one user can be present in this table as multiple players.)

Second tested and optimized solution is to use MySQL variables:

set @num := 0, @user_id := -1;

select user_id, points,
      @num := if(@user_id = user_id, @num + 1, 1) as row_number,
      @user_id := user_id as dummy
from players force index(user_id) /* optimization */
group by user_id, points, player_id /* player_id should be necessary here */
having row_number <= 30;

The first query will not be as optimial (is quadratic), while the second query is optimal (one-pass), but will only work in MySQL. The choice is up to you. If you go for the second technique, beware and test it properly with your keys and database setting; they suggest in some circumstances it might stop working.

Your final query is trivial:

select user_id, avg(points)
from ( /* here goes one of the above solutions; 
          the "set" commands should go before this big query */ ) as t
group by user_id

Note that I have not incorporated the condition you have in your 1st query (points != 0) as I don't understand your requirement well (you haven't described it), and I also think this answer should be general enough to help others with similar problem.