Parsing an XML structure with an unknown amount of recursions using SAX

Octavian A. Damiean picture Octavian A. Damiean · Sep 29, 2010 · Viewed 7.3k times · Source

I have to parse a XML structure in JAVA using the SAX parser.

The problem is that the structure is recursive with an unspecified count of recursions. This still is not such a big deal, the big deal is that I can't take advantage of the XML namespace functionality and the tags are the same on every recursion level.

Here is an example of the structure.

<?xml version="1.0" encoding="UTF-8"?>
<RootTag>
    <!-- LOADS OF OTHER TAGS -->
    <Tags attribute="value">
        <Tag attribute="value">
            <SomeOtherTag></SomeOtherTag>
            <Tags attribute="value">
                <Tag attribute="value">
                    <SomeOtherTag></SomeOtherTag>
                    <Tags attribute="value">
                        <!-- MORE OF THE SAME STRUCTURE -->
                    </Tags>
                </Tag>
            </Tags>
        </Tag>
    </Tags>
    <!-- LOADS OF OTHER TAGS -->
</RootTag>

As you can see there is a recursion, better an undefined number of recursions. Now my problem is how to extract all data for every recursion and save it in a HashMap for example.

I could define a ContentHandler for the occurrence of Tags and have it extract the content in a HashMap and put it back in a master HashMap defined in the main content handler but I'm not sure hot to do this.

How do I extract and save the content of a recursive XML structure without using namespaces?

Answer

Nathan Hughes picture Nathan Hughes · Sep 29, 2010

Check out this set of Javaworld articles on using SAX. It demonstrates an easy way to parse a recursive XML structure using SAX. It creates a state machine showing for each element which elements it can contain. As your contentHandler traverses the xml it keeps a stack showing which element it's currently on.