I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile:
SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views)
FROM view_data
GROUP BY quartile
ORDER BY quartile;
And this query is to get the names of all the people that are in the first quartile:
SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
WHERE quartile = 1
I get this error for both queries:
Invalid table alias or column reference 'quartile'
How can I reference the ntile
results in the where
clause or group by
clause?
You can't put a windowing function in a where clause because it would create ambiguity if there are compound predicates. So use a subquery.
select quartile, max(total_views) from
(SELECT total_views, NTILE(4) OVER (ORDER BY total_views) AS quartile,
FROM view_data) t
GROUP BY quartile
ORDER BY quartile
;
and
select * from
(SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data) t
WHERE quartile = 1
;