Snowflake: "SQL compilation error:... is not a valid group by expression"

HarryD picture HarryD · Nov 21, 2019 · Viewed 14k times · Source

Without resorting to CTEs or a sub-query is there any way to use Window functionality with a different summary level than the GROUP BY? COUNT(*) works, but if a column name is specified in the COUNT or the SUM function is used, the query errors with "is not a valid group by expression". Even if the PARTITION BY columns are the same as the GROUP BY the error results.

The commented out lines will cause the query to fail. It's precisely for these types of things that one would want to use Window functionality in the first place.

create table sales (product_id integer, retail_price real, quantity integer, city varchar, state varchar);
insert into sales (product_id, retail_price, quantity, city, state) values 
(1, 2.00,  1, 'SF', 'CA'),
(1, 2.00,  2, 'SJ', 'CA'),
(2, 5.00,  4, 'SF', 'CA'),
(2, 5.00,  8, 'SJ', 'CA'),
(2, 5.00, 16, 'Miami', 'FL'),
(2, 5.00, 32, 'Orlando', 'FL'),
(2, 5.00, 64, 'SJ', 'PR');

select  city, state
,   count(*) as city_sale_cnt
,   count(*) over ( partition by state) as state_sale_cnt
--  ,   count(product_id) over ( partition by state) as state_sale_cnt2
,   sum(retail_price) as city_price
--  ,   sum(retail_price) over ( partition by state) as state_price

from sales
group by 1,2;

The docs indicate Window functionality might cause problems, including the vague warning "PARTITION BY is not always compatible with GROUP BY.": The error message SQL compilation error: ... is not a valid group by expression is often a sign that different columns in the SELECT statement’s “project” clauses are not partitioned the same way and therefore might produce different numbers of rows.

Answer

Gordon Linoff picture Gordon Linoff · Nov 21, 2019

The commented out code is not correct. The reason is that the window function is parsed "after" the group by, and there is no product_id or retail_price after the group by.

This is easily fixed:

select city, state,
       count(*) as city_sale_cnt,
       count(*) over (partition by state) as state_sale_cnt,
       sum(count(product_id)) over (partition by state) as ,
       sum(retail_price) as city_price,
       sum(sum(retail_price)) over ( partition by state) as state_price
from sales
group by 1, 2;

At first, using window functions in an aggregation query looks a bit confusing -- the nested aggregation functions look awkward. I find, though that it is pretty easy to get used to the syntax, once you have used it a few times.