group_concat in SQL Server 2012 with ORDER BY another column

AGS picture AGS · Nov 12, 2012 · Viewed 8.5k times · Source

I have a table containing ~ a million entries like this:

customer_id | purchased_at     | product
1           | 2012-06-01 00:00 | apples
1           | 2012-09-02 00:00 | apples
1           | 2012-10-01 00:00 | pears
2           | 2012-06-01 00:00 | apples
2           | 2012-07-01 00:00 | apples
3           | 2012-09-02 00:00 | pears
3           | 2012-10-01 00:00 | apples
3           | 2012-10-01 01:00 | bananas

I want to concatenate the products to one row, DISTINCT and in order of the purchased_at

In MySQL I just use

select customer_id, min(purchased_at) as first_purchased_at, 
group_concat(DISTINCT product order by purchased_at) as all_purchased_products
from purchases group by customer_id;

to get

customer_id | first_purchased_at | all_purchased_products
1           | 2012-06-01 00:00 | apples, pears
2           | 2012-06-01 00:00 | apples
3           | 2012-09-02 00:00 | pears, apples, bananas

How can I do that in SQL Server 2012?

I tried the following 'hack', which works, but it's an overkill and doesn't perform well on a long table

select
customer_id,
min(purchased_at) as first_purchased_at,
stuff ( ( select  ',' +  p3.product 
          from (select  p2.product, p2.purchased_at, 
          row_number() over(partition by p2.product order by p2.purchased_at) as seq
          from  purchases p2 where
          p2.customer_id = p1.customer_id ) p3 
          where p3.seq = 1 order by p3.purchased_at
          for XML PATH('') ), 1,1,'') AS all_purchased_products  
from purchases p1
group by customer_id;

What can I do to solve this?

Answer

Taryn picture Taryn · Nov 12, 2012

I am not sure if this will be any faster, but here is an alternate version where you don't join on purchases twice in the STUFF():

select customer_id,
  min(purchased_at) as first_purchased_at,
  stuff ((select ',' +  p2.product 
          from
          (
            select product, customer_id,
                ROW_NUMBER() over(partition by customer_id, product order by purchased_at) rn,
                ROW_NUMBER() over(partition by customer_id order by purchased_at) rnk   
            from purchases
          ) p2 
          where p2.customer_id = p1.customer_id
            and p2.rn = 1
          group by p2.product, rn, rnk
          order by rnk
          for XML PATH('') ), 1,1,'') AS all_purchased_products  
from purchases p1
group by customer_id;

See SQL Fiddle with Demo

Result:

| CUSTOMER_ID |               FIRST_PURCHASED_AT | ALL_PURCHASED_PRODUCTS |
---------------------------------------------------------------------------
|           1 |      June, 01 2012 00:00:00+0000 |           apples,pears |
|           2 |      June, 01 2012 00:00:00+0000 |                 apples |
|           3 | September, 02 2012 00:00:00+0000 |   pears,apples,bananas |