Produce DISTINCT values in STRING_AGG

Brent Arias picture Brent Arias · Aug 2, 2018 · Viewed 23.5k times · Source

I'm using the STRING_AGG function in SQL Server 2017. I'd like to create the same effect as COUNT(DISTINCT <column>). I tried STRING_AGG(DISTINCT <column>,',') but that is not legal syntax.

I'd like to know if there is a T-SQL work-around. Here is my sample:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

The above produces the following result:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

The output is exactly what I want, except I want the concatenated "Animals" listed for Phoenix Arizona to be DISTINCT, like this:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

Any ideas?

When I use my real data set, which is much larger, I get an error about the "Animals" column exceeding 8000 characters.

My question I think is the same as this one, except my example is much simpler.

Answer

Vladimir Baranov picture Vladimir Baranov · Aug 2, 2018

Here is one way to do it.

Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

WITH
Sitings
AS
(
    SELECT * FROM (VALUES 
    (1, 'Florida', 'Orlando', 'bird'),
    (2, 'Florida', 'Orlando', 'dog'),
    (3, 'Arizona', 'Phoenix', 'bird'),
    (4, 'Arizona', 'Phoenix', 'dog'),
    (5, 'Arizona', 'Phoenix', 'bird'),
    (6, 'Arizona', 'Phoenix', 'bird'),
    (7, 'Arizona', 'Phoenix', 'bird'),
    (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
    SELECT
        State, City, Siting
    FROM Sitings
    GROUP BY State, City, Siting
)
SELECT
    State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
    State
    ,City
;

Result

+---------+-----------+--------------+----------+
|  State  |   City    | # Of Sitings | Animals  |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff |            1 | dog      |
| Arizona | Phoenix   |            2 | bird,dog |
| Florida | Orlando   |            2 | bird,dog |
+---------+-----------+--------------+----------+

If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

Something like

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals