I'm trying to get distinct values when using GROUP_CONCAT in BigQuery.
I'll recreate the situation using a simpler, static example:
EDIT: I've modified the example to represent better my real situation: 2 columns with group_concat which needs to be distinct:
SELECT
category,
GROUP_CONCAT(id) as ids,
GROUP_CONCAT(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
This example returns:
Row category ids products
1 a 1,2,3,1 car,car,car,truck
2 b 4,5,6 car,car,bike
I'd like to strip the duplicated values found, to return like:
Row category ids products
1 a 1,2,3 car,truck
2 b 4,5,6 car,bike
In MySQL, GROUP_CONCAT has a DISTINCT OPTION, but in BigQuery there isn't.
Any ideas?
Here is solution which uses UNIQUE
scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED
using NEST
aggregation:
SELECT
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)