I have a table called trends_points
, this table has the following columns:
Now, I'm trying to run a query on this table which will get the rows in a specific time frame ordered by how many times the column term
appears in the table during the specific timeframe...So for example if the table has the following rows:
id | userId | term | time
------------------------------------
1 28 new year 1262231638
2 37 new year 1262231658
3 1 christmas 1262231666
4 34 new year 1262231665
5 12 christmas 1262231667
6 52 twitter 1262231669
I'd like the rows to come out ordered like this:
new year
christmas
twitter
This is because "new year" exists three times in the timeframe, "christmas" exists twice and "twitter" is only in one row.
So far I've asummed it's a simple WHERE for the specific timeframe part of the query and a GROUP BY to stop the same term from coming up twice in the list.
This makes the following query:
SELECT *
FROM `trends_points`
WHERE ( time >= <time-period_start>
AND time <= <time-period_end> )
GROUP BY `term`
Does anyone know how I'd do the final part of the query? (Ordering the query's results by how many rows contain the same "term" column value..).
Use:
SELECT tp.term,
COUNT(*) 'term_count'
FROM TREND_POINTS tp
WHERE tp.time BETWEEN <time-period_start> AND <time-period_end>
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
See this question about why to use BETWEEN vs using the >=/<= operators.
Keep in mind there can be ties - the order by defaults to alphabetically shorting by term value when this happens, but there could be other criteria.
Also, if you want to additionally limit the number of rows/terms coming back you can add the LIMIT
clause to the end of the query. For example, this query will return the top five terms:
SELECT tp.term,
COUNT(*) 'term_count'
FROM TREND_POINTS tp
WHERE tp.time BETWEEN <time-period_start> AND <time-period_end>
GROUP BY tp.term
ORDER BY term_count DESC, tp.term
LIMIT 5