Bigquery SQL - Is it better to unnest in SELECT or JOIN?

A.S. picture A.S. · Oct 31, 2017 · Viewed 8.6k times · Source

I have a dataset where views are nested inside of sessions and I want a count of views for each session. What is the more efficient/proper way to structure such a query?

Is there any documentation that talks about the preferred way to write queries in BigQuery SQL?

SELECT session_key, ( SELECT COUNT( view_id ) FROM UNNEST( views ) views ) AS view_count 
FROM sessions 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) ;

SELECT session_key, COUNT( view_id ) AS view_count 
FROM sessions 
  LEFT JOIN UNNEST( views ) views 
WHERE _PARTITIONTIME >= TIMESTAMP( '2016-04-01' ) 
GROUP BY session_key; 

Thank you

Answer

Felipe Hoffa picture Felipe Hoffa · Oct 31, 2017

Working queries:

a)

SELECT visitId, ( SELECT COUNT( hitNumber ) FROM UNNEST( hits ) ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`

b)

SELECT visitId, COUNT( hitNumber ) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
LEFT JOIN UNNEST( hits )  
GROUP BY visitId

The first query looks shorter and more concise, but let's also look at the explain tab:

a) enter image description here

b) enter image description here

It also looks better at execution time! Probably because the second query has a GROUP BY visitId, that forces BigQuery to look if there are any other sessions with the same id.

But if you are looking for an even more concise option:

SELECT visitId, ARRAY_LENGTH(hits) AS view_count 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`