MySQL "ORDER BY" the amount of rows with the same value for a certain column?

Simon picture Simon · Dec 31, 2009 · Viewed 8.8k times · Source

I have a table called trends_points, this table has the following columns:

  • id (the unique id of the row)
  • userId (the id of the user that has entered this in the table)
  • term (a word)
  • time (a unix timestamp)

Now, I'm trying to run a query on this table which will get the rows in a specific time frame ordered by how many times the column term appears in the table during the specific timeframe...So for example if the table has the following rows:

id | userId | term        | time
------------------------------------
1    28       new year      1262231638
2    37       new year      1262231658
3    1        christmas     1262231666
4    34       new year      1262231665
5    12       christmas     1262231667
6    52       twitter       1262231669

I'd like the rows to come out ordered like this:

new year
christmas
twitter

This is because "new year" exists three times in the timeframe, "christmas" exists twice and "twitter" is only in one row.

So far I've asummed it's a simple WHERE for the specific timeframe part of the query and a GROUP BY to stop the same term from coming up twice in the list.

This makes the following query:

SELECT * 
  FROM `trends_points` 
 WHERE ( time >= <time-period_start> 
  AND time <= <time-period_end> ) 
GROUP BY `term`

Does anyone know how I'd do the final part of the query? (Ordering the query's results by how many rows contain the same "term" column value..).

Answer

OMG Ponies picture OMG Ponies · Dec 31, 2009

Use:

  SELECT tp.term,
         COUNT(*) 'term_count'
    FROM TREND_POINTS tp
   WHERE tp.time BETWEEN <time-period_start> AND <time-period_end> 
GROUP BY tp.term
ORDER BY term_count DESC, tp.term

See this question about why to use BETWEEN vs using the >=/<= operators.

Keep in mind there can be ties - the order by defaults to alphabetically shorting by term value when this happens, but there could be other criteria.

Also, if you want to additionally limit the number of rows/terms coming back you can add the LIMIT clause to the end of the query. For example, this query will return the top five terms:

  SELECT tp.term,
         COUNT(*) 'term_count'
    FROM TREND_POINTS tp
   WHERE tp.time BETWEEN <time-period_start> AND <time-period_end> 
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
   LIMIT 5