UNNEST expression references column which is neither grouped nor aggregated

Jivan picture Jivan · Nov 24, 2016 · Viewed 13k times · Source

Google Analytics BigQuery tables are structured like this (Legacy SQL notations - only relevant fields are shown):

visitId:                      INTEGER
hits:                         RECORD/REPEATED
hits.hour:                    INTEGER

On one such table, the following query works well:

SELECT
  visitId,
  MIN(h.hour) AS firstHitHour
FROM
  `my-table.ga_sessions_20161122`, UNNEST(hits) AS h
GROUP BY
  visitId

But using this alternative syntax:

SELECT
  visitId,
  (SELECT MIN(hour) FROM UNNEST(hits)) as firstHitHour
FROM
  `my-table.ga_sessions_20161122`
GROUP BY
  visitId

Triggers the following error:

Error: UNNEST expression references column hits which is neither grouped nor aggregated

I understand that UNNEST(hits) must be somehow grouped or aggregated, but since this column is an array (repeated), what does it mean exactly?

If I try to "group the column hits", as requested, like this:

(SELECT MIN(hour) FROM UNNEST(hits) as h GROUP BY h) as firstHitHour

Then I get a Grouping by expressions of type STRUCT is not allowed error.

How can this alternative syntax be corrected to produce the same result as the first one?

Answer

Mikhail Berlyant picture Mikhail Berlyant · Nov 24, 2016

My first Answer is for original version of this question.
When I answered, I realized you have changed it to quite different one :o)

So below answer is for most recent version of your question:

I think that in "alternative" version you just do not need GROUP BY at all, because you operate on original (un-flattened) row by row and for each row (visitId) you calculate firstHitHour

SELECT
  visitId,
  (SELECT MIN(hour) FROM UNNEST(hits)) as firstHitHour
FROM
  `my-table.ga_sessions_20161122`

In your initial query - you kind of flattening all records for each row - so that's why you need then to group them back