My current application calculates a point average based on all records for each user:
SELECT `user_id`, AVG(`points`) AS pts
FROM `players`
WHERE `points` != 0
GROUP BY `user_id`
The business requirement has changed and I need to calculate the average based on the last 30 records for each user.
The relevant tables have the following structure:
table: players; columns: player_id, user_id, match_id, points
table: users; columns: user_id
The following query does not work, but it does demonstrate the logic that I am trying to implement.
SELECT @user_id := u.`id`, (
-- Calculate the average for last 30 records
SELECT AVG(plr.`points`)
FROM (
-- Select the last 30 records for evaluation
SELECT p.`points`
FROM `players` AS p
WHERE p.`user_id`=@user_id
ORDER BY `match_id` DESC
LIMIT 30
) AS plr
) AS avg_points
FROM `users` AS u
Is there a fairly efficient way to calculate the averages based on the latest 30 records for each user?
There is no reason to reinvent a wheel and risk you have a buggy, suboptimal code. Your problem is trivial extension of common per group limit problem. There are already tested and optimized solutions to solve this problem, and from this resource I would recommend to choose from following two solutions. These queries produce latest 30 records for each player (rewritten for your tables):
select user_id, points
from players
where (
select count(*) from players as p
where p.user_id = players.user_id and p.player_id >= players.player_id
) <= 30;
(Just to make sure I understand your structure: I suppose player_id
is a unique key in players table and that one user can be present in this table as multiple players.)
Second tested and optimized solution is to use MySQL variables:
set @num := 0, @user_id := -1;
select user_id, points,
@num := if(@user_id = user_id, @num + 1, 1) as row_number,
@user_id := user_id as dummy
from players force index(user_id) /* optimization */
group by user_id, points, player_id /* player_id should be necessary here */
having row_number <= 30;
The first query will not be as optimial (is quadratic), while the second query is optimal (one-pass), but will only work in MySQL. The choice is up to you. If you go for the second technique, beware and test it properly with your keys and database setting; they suggest in some circumstances it might stop working.
Your final query is trivial:
select user_id, avg(points)
from ( /* here goes one of the above solutions;
the "set" commands should go before this big query */ ) as t
group by user_id
Note that I have not incorporated the condition you have in your 1st query (points != 0)
as I don't understand your requirement well (you haven't described it), and I also think this answer should be general enough to help others with similar problem.