How to split a CLOB object using , and : delimiter in Oracle into multiple records

Usha picture Usha · Jul 15, 2016 · Viewed 10.2k times · Source

I have a CLOB object sample as shown below. I want to first split this by using delimiter "," and save it in a temporary table for later use.

ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0

I want to save the result in the below format in each row.

Column_Name
__________________________
ABCDEF:PmId12345RmLn1VlId0
ABCDEF:PmId12345RmLn1VlId0
ABCDEF:PmId12345RmLn1VlId0

I tried using REGEXP_SUBSTR function

select 
    regexp_substr('ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0', '[^,]+', 1, 1) Column_Name 
from dual;

The above query gives me single record like below

Column_Name
__________________________
ABCDEF:PmId12345RmLn1VlId0

Can anyone help me solve this issue.

Answer

mathguy picture mathguy · Jul 15, 2016

Here is a solution using a recursive factored subquery (Oracle 11.2 and above):

with inputs ( str ) as (
       select to_clob('ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0,ABCDEF:PmId12345RmLn1VlId0')
       from dual
     ),
     prep ( s, n, token, st_pos, end_pos ) as (
       select ',' || str || ',', -1, null, null, 1
         from inputs
       union all
       select s, n+1, substr(s, st_pos, end_pos - st_pos),
              end_pos + 1, instr(s, ',', 1, n+3)
         from prep
         where end_pos != 0
     )
select n as idx, token as column_name
from   prep
where  n > 0;



   IDX COLUMN_NAME
------ ----------------------------
     1 ABCDEF:PmId12345RmLn1VlId0
     2 ABCDEF:PmId12345RmLn1VlId0
     3 ABCDEF:PmId12345RmLn1VlId0
     4 ABCDEF:PmId12345RmLn1VlId0
     5 ABCDEF:PmId12345RmLn1VlId0

Notes:

You said CLOB but in your example you extracted from a varchar2 string. I added to_clob() to see if/how this works on a CLOB.

I used instr and substr, as they often (usually?) perform between better and much better than their regexp equivalents.

I saved the "index" of each substring within the input string; in some cases the order of the tokens in the input string is important. (Not in your example though, you just had the same token repeated five times.)

If you need better performance, especially if your CLOBs are very large, you may be better off using dbms_lob.substr and dbms_lob.instr - see Performance of SUBSTR on CLOB, especially Alex Poole's answer, and documentation here: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_lob.htm#BABEAJAD. Note the syntax differences vs regular substr / instr.