I have a table in SQL that looks like this:
user_id | data1
0 | 6
0 | 6
0 | 6
0 | 1
0 | 1
0 | 2
1 | 5
1 | 5
1 | 3
1 | 3
1 | 3
1 | 7
I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:
user_id | most_frequent_value
0 | 6
1 | 3
I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it.
SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1
You can use a window function to rank the userids based on their count of data1.
WITH cte AS (
SELECT
user_id
, data1
, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
user_id,
data1)
SELECT
user_id,
data1
FROM cte WHERE rn = 1