Return all matches of a regular expression in Oracle

Mike picture Mike · Jan 8, 2013 · Viewed 8.4k times · Source

I have a table that contains a VARCHAR2 column called COMMANDS.

The data in this column is a bunch of difficult to read ZPL code that will be sent to a label printer, and amidst the ZPL there are several tokens in the form {TABLE.COLUMN}.

I would a like nice list of all the distinct {TABLE.COLUMN} tokens that are found in COMMANDS. I wrote the following regex to match the token format:

SELECT REGEXP_SUBSTR(COMMANDS,'\{\w+\.\w+\}') FROM MYTABLE;

The regex works, but it only returns the first matched token per row. Is there a way to return all regex matches for each row?

I'm using Oracle 11GR2.

Edit - Here is a small sample of data from a single row -- there are many such lines in each row:

^FO360,065^AEN,25,10^FD{CUSTOMERS.CUST_NAME}^FS
^FO360,095^AAN,15,12^FD{CUSTOMERS.CUST_ADDR1}^FS

So if that was the only row in table, I'd like to have returned:

{CUSTOMERS.CUST_NAME}
{CUSTOMERS.CUST_ADDR1}

Answer

Nick Krasnov picture Nick Krasnov · Jan 8, 2013

You've provided sample of data saying that this is a single row but have presented it as two different rows. So this example based on your words.

 -- Sample of data from your question + extra row for the sake of demonstration
 -- id column is added to distinguish the rows(I assume you have one)
  with t1(id, col) as( 
    select 1, '^FO360,065^AEN,25,10^FD{CUSTOMERS1.CUST_NAME}^FS^FO360,095^AAN,15,12^FD{CUSTOMERS1.CUST_ADDR1}^FS' from dual union all
    select 2, '^FO360,065^AEN,25,10^FD{CUSTOMERS2.CUST_NAME}^FS^FO360,095^AAN,15,12^FD{CUSTOMERS2.CUST_ADDR2}^FS' from dual
  ),
  cnt(c) as(
    select level
      from (select max(regexp_count(col, '{\w+.\w+}')) as o_c
              from t1
            ) z
     connect by level <= z.o_c
  )
  select t1.id, listagg(regexp_substr(t1.col, '{\w+.\w+}', 1, cnt.c)) within group(order by t1.id) res
    from t1
   cross join cnt
   group by t1.id

Result:

    ID   RES
    ---------------------------------------------------------
     1   {CUSTOMERS1.CUST_ADDR1}{CUSTOMERS1.CUST_NAME}
     2   {CUSTOMERS2.CUST_ADDR2}{CUSTOMERS2.CUST_NAME}

As per @a_horse_with_no_name comment to the question, really, it's much simpler to just replace everything else that doesn't match the pattern. Here is an example:

 with t1(col) as(
    select '^FO360,065^AEN,25,10^FD{CUSTOMERS.CUST_NAME}^FS^FO360,095^AAN,15,12^FD{CUSTOMERS.CUST_ADDR1}^FS' from dual
 )
 select regexp_replace(t1.col, '({\w+.\w+})|.', '\1') res
  from t1

Result:

RES
-------------------------------------------
{CUSTOMERS.CUST_NAME}{CUSTOMERS.CUST_ADDR1}