I am unable to apply a proper regex on customtarget column in bigquery.
With normal MSSQL:
SELECT * from mytable where CustomTargeting like = '%u=%' -- is all okay
With Bigquery(legacy-sql) :
SELECT REGEXP_EXTRACT(CustomTargeting, r'[^u=\d]') as validate_users
from [project:dataset.impressions_4213_20181112] Limit 10
Error:
Exactly one capturing group must be specified
Update:
Yet couldn't get substring u ='anystring'
How can I extract data where CustomTargeting ='%u=somestring%'?
For BigQuery Legacy SQL
In SELECT
statement list you can use
SELECT REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')
In WHERE
clause - you can use
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')
So, you query can look like
#legacySQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')
FROM [project:dataset.impressions_4213_20181112]
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')
For BigQuery Standard SQL
Same for SELECT
But different for WHERE
- WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')
#standardSQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')
FROM `project.dataset.impressions_4213_20181112`
WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')
Update - To address provided data example:
Regular expression updated from r'^u=(\d*)')
to r'(?:^|;)u=(\d*)')
- hope it is self-descriptive, but if not - it makes match to be searched either at the begonning of string or after ;