How to use regex in Bigquery

Shivkumar kondi picture Shivkumar kondi · Dec 4, 2018 · Viewed 31.6k times · Source

I am unable to apply a proper regex on customtarget column in bigquery.

With normal MSSQL:

SELECT * from mytable where CustomTargeting like = '%u=%'  -- is all okay

With Bigquery(legacy-sql) :

SELECT REGEXP_EXTRACT(CustomTargeting, r'[^u=\d]') as validate_users
from [project:dataset.impressions_4213_20181112] Limit 10

Error:

Exactly one capturing group must be specified

Update:

Yet couldn't get substring u ='anystring'

enter image description here

How can I extract data where CustomTargeting ='%u=somestring%'?

Answer

Mikhail Berlyant picture Mikhail Berlyant · Dec 4, 2018

For BigQuery Legacy SQL

In SELECT statement list you can use
SELECT REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')

In WHERE clause - you can use
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')

So, you query can look like

#legacySQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') 
FROM [project:dataset.impressions_4213_20181112]
WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')   

For BigQuery Standard SQL

Same for SELECT
But different for WHERE - WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')

#standardSQL
SELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') 
FROM `project.dataset.impressions_4213_20181112`
WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')  

Update - To address provided data example:

Regular expression updated from r'^u=(\d*)') to r'(?:^|;)u=(\d*)') - hope it is self-descriptive, but if not - it makes match to be searched either at the begonning of string or after ;