Redshift. Convert comma delimited values into rows

Yuri Levinsky picture Yuri Levinsky · Aug 4, 2014 · Viewed 19.8k times · Source

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:

I have:

user_id|user_name|user_action
-----------------------------
1      | Shone   | start,stop,cancell...

I would like to see

user_id|user_name|parsed_action 
------------------------------- 
1      | Shone   | start        
1      | Shone   | stop         
1      | Shone   | cancell      
....

Answer

Bob Baxley picture Bob Baxley · Aug 13, 2015

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.

Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.

Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:

select 
  (row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;

If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:

select
  n::int
into numbers
from
  (select 
      row_number() over (order by true) as n
   from cmd_logs)
cross join
  (select 
      max(regexp_count(user_action, '[,]')) as max_num 
   from cmd_logs)
where
  n <= max_num + 1;

Once there is a numbers table, we can do:

select
  user_id, 
  user_name, 
  split_part(user_action,',',n) as parsed_action 
from
  cmd_logs
cross join
  numbers
where
  split_part(user_action,',',n) is not null
  and split_part(user_action,',',n) != '';