I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?
Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?
SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count
FROM sessions
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;
SELECT session_key, COUNT( view_id ) AS view_count
FROM sessions
LEFT JOIN UNNEST( views ) views
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' )
GROUP BY session_key;
Thank you
Working queries:
a)
SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
b)
SELECT visitId, COUNT( hitNumber ) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )
GROUP BY visitId
The first query looks shorter and more concise, but let's also look at the explain tab:
It also looks better at execution time! Probably because the second query has a GROUP BY visitId
, that forces BigQuery to look if there are any other sessions with the same id.
But if you are looking for an even more concise option:
SELECT visitId, ARRAY_LENGTH(hits) AS view_count
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`