XML - Referencing Other XML Files

Scott picture Scott · Jun 30, 2009 · Viewed 19.5k times · Source

I'm new to XML, so this may be a fairly easy question to answer. I was wondering if there is a standard way of referencing external XML files from within other XML files. Let me give an example. Say you have a file which defines a single object that holds a large amount of data:

<person>
    <name>John</name>
    <age>18</age>
    <hair>Brown</hair>
    <eyes>Blue</eyes>
</person>

For the sake of this question, pretend that person holds loads of other information. Pretend the file is like 10 MB.

Now, let's say you have another XML file which defines a group:

<group>
    <person>
        <name>John</name>
        <age>18</age>
        <hair>Brown</hair>
        <eyes>Blue</eyes>
    </person>
    <person>
        <name>Kim</name>
        <age>21</age>
        <hair>Blue</hair>
        <eyes>Green</eyes>
    </person>
    <person>
        <name>Sean</name>
        <age>22</age>
        <hair>Black</hair>
        <eyes>Brown</eyes>
    </person>
</group>

As you can see, if Person's were very large, the Group file would be extremely large. So, if we have something like John.xml, is there a standard way to reference it in Group.xml without explicitly defining all of John's data? I'm sure this is a very broad topic, so feel free to link me to any relevant web pages. Thanks!

Answer

lavinio picture lavinio · Jul 1, 2009

Standards

XInclude is the only standard with any level of support.

  • Several XML editors, including Oxygen and xmlspy support it.
  • Several XML parsers, including Xerces, also support it, and there are .net ports too.
  • Several XML tools, such as Saxon support it, both for Java and .net.

There are some good examples of use in the Wikipedia article on XInclude.

XLink is a tangentially-related standard, not really for including documents, but more for citing portions within other documents. It's not well supported.

Alternatives

If you are worried about size, there are several ways to go:

  • Use a streaming XML processor, such as DataDirect XQuery (or to a lesser extent, Saxon 9.3 EE, which only keeps as much information in memory as necessary to solve the query.
  • Use an XML database, such as MarkLogic or eXist.
  • Use one XML file to list the names of other XML files, which some program written in XQuery or XSLT then reads using the doc() function and processes. (Unless your processor is streaming or has a way to dispose of documents it is finished with, as DDXQ or Saxon do, you will still run into the same size problem through.)