How to extract element-path from XMLType Node?

towi picture towi · Apr 9, 2013 · Viewed 13.7k times · Source

I would like to have a select statement on an XML document and one column should return me the path of each node.

For example, given the data

SELECT * 
FROM TABLE(XMLSequence(
  XMLTYPE('<?xml version="1.0"?>
    <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
    </users>').extract('/*//*[text()]'))) t;

Which results in

column_value
--------
<user><name>user1</name></user>
<user><name>user2</name></user>
<user><name>user3</name></user>
<user><name>user4</name></user>

I'd like to have a result like this:

path                     value
------------------------ --------------
/users/user/name         user1
/users/user/name         user2
/users/group/user/name   user3
/users/user/name         user4

I can not see how to get to this. I figure there are two thing that have to work together properly:

  • Can I extract the path from an XMLType with a single operation or method, or do I have to do this with string-magic?
  • What is the correct XPath expression so that I do get the whole element path (if thats possible), eg. <users><group><user><name>user3</name></user></group></user> insead of <user><name>user3</name></user>?

Maybe I am not understanding XMLType fully, yet. It could be I need a different approach, but I can not see it.

Sidenotes:

  • In the final version the XML document will be coming from CLOBs of a table, not a static document.
  • The path column can of course also use dots or whatever and the initial slash is not the issue, any representation would do.
  • Also I would not mind if every inner node also gets a result row (possibly with null as value), not only the ones with text() in it (which is what I am really interested in).
  • In the end I will need the tail element of path separate (always "name" in the example here, but this will vary later), i.e. ('/users/groups/user', 'name', 'user3'), I can deal with that separately.

Answer

ThinkJet picture ThinkJet · Apr 9, 2013

You can achieve that with help of XMLTable function from Oracle XML DB XQuery function set:

select * from 
  XMLTable(
    '
     declare function local:path-to-node( $nodes as node()* )  as xs:string* {
       $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
     };
     for $i in $rdoc//name 
       return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>
    '
    passing 
    XMLParse(content '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>'
    )
    as "rdoc"
    columns 
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  )

For me XQuery looks at least more intuitive for XML data manipulation than XSLT.

You can find useful set of XQuery functions here.

Update 1

I suppose that you need totally plain dataset with full data at last stage. This target can be reached by complicated way, constructed step-by-step below, but this variant is very resource-angry. I propose to review final target (selecting some specific records, count number of elements etc.) and after that simplify this solution or totally change it.

Update 2

All steps deleted from this Update except last because @A.B.Cade proposed more elegant solution in comments. This solution provided in Update 3 section below.

Step 1 - Constructing dataset of id's with corresponding query results

Step 2 - Aggregating to single XML row

Step 3 - Finally get full plain dataset by querying constracted XML with XMLTable

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all 
  select 30 id, xml_string xml_data from xmlsource 
) 
select  *
from
  XMLTable(
    '
        for $entry_user in $full_doc/full_list/list_entry/name_info
          return <tuple>
                   <id>{data($entry_user/../@id_value)}</id>
                   <path>{$entry_user/name_path/text()}</path>
                   <name>{$entry_user/name_value/text()}</name>
                  </tuple> 
    '
    passing ( 
      select  
        XMLElement("full_list", 
          XMLAgg(     
            XMLElement("list_entry",
              XMLAttributes(id as "id_value"),
              XMLQuery(
                '
                 declare function local:path-to-node( $nodes as node()* )  as xs:string* {
                   $nodes/string-join(ancestor-or-self::*/name(.), ''/'')
                 };(: function to construct path :) 
                 for $i in $rdoc//name return <name_info><name_path>{local:path-to-node($i)}</name_path><name_value>{$i/text()}</name_value></name_info>
                '
                passing by value XMLParse(content xml_data) as "rdoc"
                returning content
              )
            )
          )
        )        
        from xml_table
    )   
    as "full_doc"      
    columns
      id_val   varchar2(4000) path '//tuple/id',
      path_val varchar2(4000) path '//tuple/path',
      name_val varchar2(4000) path '//tuple/name'
  )    

Update 3

As mentioned by @A.B.Cade in his comment, there are really simple way to join ID's with XQuery results.

Because I don't like external links in answers, code below represents his SQL fiddle, a little bit adapted to the data source from this answer:

with xmlsource as (
  -- only for purpose to write long string only once
  select '
      <users><user><name>user1</name></user>
           <user><name>user2</name></user>
           <group>
              <user><name>user3</name></user>
           </group>
           <user><name>user4</name></user>
      </users>' xml_string
   from dual   
),
xml_table as ( 
  -- model of xmltable
  select 10 id, xml_string xml_data from xmlsource union all 
  select 20 id, xml_string xml_data from xmlsource union all
  select 30 id, xml_string xml_data from xmlsource
)
select xd.id, x.*  from
xml_table xd,
  XMLTable(
    'declare function local:path-to-node( $nodes as node()* )  as xs:string* {$nodes/string-join(ancestor-or-self::*/name(.), ''/'')     };     for $i in $rdoc//name        return <ret><name_path>{local:path-to-node($i)}</name_path>{$i}</ret>    '
    passing
    XMLParse(content xd.xml_data
    )
    as "rdoc"
    columns
      name_path  varchar2(4000) path '//ret/name_path',
      name_value varchar2(4000) path '//ret/name'

  ) x