HAVING clause without grouping by all non-aggregate columns in SELECT
I am often faced with tables that have an id column and a month column along with many other metrics. I weed out certain rows based on other columns. I only want to id’s that have all 12 months’ worth of data, so I do the following:
proc sql;
create table t as
SELECT *
FROM T1
GROUP BY id
HAVING COUNT(id) = 12
quit;
It seems to work for me, but I was wondering if there are any dangers with this method. I know I can join an aggregated subquery with only id and the count of ids, but this method is much easier to explain to non sql users.
In general, can you used the HAVING
clause with only a couple of the columns in the GROUP BY
and no aggregate functions in the SELECT
statement?
It depends on the database. On one of mine (redbrick) the query your presented will throw an error. To make it work, I'd have to change
select *
to
select id.
However, you might have misguided motivation. I am usually interested in the most efficient way to get the correct answer. Clarity to myself and anyone else who might have to maintain it is also important, but clarity to laypeople is not one of my priorities.