I'm working on a cohort analysis where I count the number of users who signed up by week, then count the number of events each perform over subsequent weeks. (Pretty standard stuff!)
After spending a lot of time understanding BigQuery nesting and array data, I decided to create two views to flatten nested user_dim and event_dim columns using wildcard over the entire data set (tables generated of events on a daily basis). That's how I wound up with these views, on which my query is based:
When I execute the following query joining those two views, I get the error "Failed to parse input string "20161111"":
SELECT
DATE_TRUNC(users.first_seen_date, WEEK) AS week,
COUNT(DISTINCT users.uid) AS signed_up_users,
COUNT(DISTINCT events.uid) AS logged_in_users,
CASE
WHEN COUNT(DISTINCT users.uid) > 0 THEN COUNT(DISTINCT events.uid) * 100 / COUNT(DISTINCT users.uid)
ELSE 0
END AS retention_pct
FROM
USERS_VIEW AS users
LEFT JOIN
EVENTS_VIEW AS events
ON
users.uid = events.uid
AND PARSE_DATE('%x', events.event_date) >= DATE_ADD(users.first_seen_date, INTERVAL 1 WEEK)
AND PARSE_DATE('%x', events.event_date) < DATE_ADD(users.first_seen_date, INTERVAL 2 WEEK)
GROUP BY
1
ORDER BY
1
I feel like this should be simple, but I can't figure out what formatting I'm missing to ensure the dates can be parsed. (And the UI doesn't tell me which line number is the offender.) I'm hoping it's a silly typo that someone else can see. Thanks in advance for any help!
I get the error "Failed to parse input string "20161111""
I think below will help to address that error
PARSE_DATE('%x', events.event_date)
should be
PARSE_DATE('%Y%m%d', events.event_date)
Also, optionally, you might want to change LEFT JOIN
to just JOIN