Using Hive ntile results in where clause

Nadine picture Nadine · Jul 21, 2015 · Viewed 9.4k times · Source

I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile:

SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views)
FROM view_data
GROUP BY quartile
ORDER BY quartile;

And this query is to get the names of all the people that are in the first quartile:

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
WHERE quartile = 1

I get this error for both queries:

Invalid table alias or column reference 'quartile'

How can I reference the ntile results in the where clause or group by clause?

Answer

invoketheshell picture invoketheshell · Jul 21, 2015

You can't put a windowing function in a where clause because it would create ambiguity if there are compound predicates. So use a subquery.

select quartile, max(total_views) from
(SELECT total_views, NTILE(4) OVER (ORDER BY total_views) AS quartile,
FROM view_data) t
GROUP BY quartile
ORDER BY quartile
;

and

select * from 
(SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data) t
WHERE quartile = 1
;