Update XML using XMLStreamWriter

deepak picture deepak · Aug 29, 2013 · Viewed 7.3k times · Source

I have a large XML and I want to update a particular node of the XML (like removing duplicate nodes).

As the XML is huge I considered using the STAX api class - XMLStreamReader. I first read the XML using XMLStreamReader. I stored the read data in user objects and manipulated these user objects to remove duplicates.

Now I want to put this updated user object back into my original XML. What I thought is that I can marshall the user object to a string and place the string at the right position in my input xml. But I am not able to achieve it using the STAX class - XMLStreamWriter

Can this be achieved using XMLStreamWriter? Please suggest. If no, they please suggest an alternative approach to my problem.

My main concern is memory as I cannot load such huge XMLs into our project server's memory which is shared across multiple processes. Hence I do not want use DOM because this will use lot of memory to load these huge XML.

Answer

Prashant Bhate picture Prashant Bhate · Aug 30, 2013

If you need to alter a particular value like text content /tag name etc. STAX might help. It would also help in removing few elements using createFilteredReader

Below code renames Name to AuthorName and adds a comment

public class StAx {
    public static void main(String[] args) throws FileNotFoundException,
            XMLStreamException {

        String filename = "HelloWorld.xml";

        try (InputStream in = new FileInputStream(filename);
                OutputStream out = System.out;) {
            XMLInputFactory factory = XMLInputFactory.newInstance();
            XMLOutputFactory xof = XMLOutputFactory.newInstance();
            XMLEventFactory ef = XMLEventFactory.newInstance();

            XMLEventReader reader = factory.createXMLEventReader(filename, in);
            XMLEventWriter writer = xof.createXMLEventWriter(out);

            while (reader.hasNext()) {
                XMLEvent event = (XMLEvent) reader.next();
                if (event.isCharacters()) {
                    String data = event.asCharacters().getData();
                    if (data.contains("Hello")) {
                        String replace = data.replace("Hello", "Oh");
                        event = ef.createCharacters(replace);
                    }
                    writer.add(event);
                } else if (event.isStartElement()) {
                    StartElement s = event.asStartElement();
                    String tagName = s.getName().getLocalPart();
                    if (tagName.equals("Name")) {
                        String newName = "Author" + tagName;
                        event = ef.createStartElement(new QName(newName), null,
                                null);
                        writer.add(event);
                        writer.add(ef.createCharacters("\n          "));
                        event = ef.createComment("auto generated comment");
                        writer.add(event);
                    } else {
                        writer.add(event);
                    }
                } else {
                    writer.add(event);
                }
            }
            writer.flush();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Input

<?xml version="1.0"?>
<BookCatalogue>
    <Book>
        <Title>HelloLord</Title>
        <Name>
            <first>New</first>
            <last>Earth</last>
        </Name>
        <ISBN>12345</ISBN>
    </Book>
    <Book>
        <Title>HelloWord</Title>
        <Name>
            <first>New</first>
            <last>Moon</last>
        </Name>
        <ISBN>12346</ISBN>
    </Book>
</BookCatalogue>

Output

<?xml version="1.0"?><BookCatalogue>
    <Book>
        <Title>OhLord</Title>
        <AuthorName>
            <!--auto generated comment-->
            <first>New</first>
            <last>Earth</last>
        </AuthorName>
        <ISBN>12345</ISBN>
    </Book>
    <Book>
        <Title>OhWord</Title>
        <AuthorName>
            <!--auto generated comment-->
            <first>New</first>
            <last>Moon</last>
        </AuthorName>
        <ISBN>12346</ISBN>
    </Book>
</BookCatalogue>

As you can see things gets really complicated when modification is much more than this like swapping two nodes deleting one node based on state of few other node : delete All Books with price more than average price

Best solution in this case is to produce resulting xml using xslt transformation