I have a large XML and I want to update a particular node of the XML (like removing duplicate nodes).
As the XML is huge I considered using the STAX api class - XMLStreamReader. I first read the XML using XMLStreamReader. I stored the read data in user objects and manipulated these user objects to remove duplicates.
Now I want to put this updated user object back into my original XML. What I thought is that I can marshall the user object to a string and place the string at the right position in my input xml. But I am not able to achieve it using the STAX class - XMLStreamWriter
Can this be achieved using XMLStreamWriter? Please suggest. If no, they please suggest an alternative approach to my problem.
My main concern is memory as I cannot load such huge XMLs into our project server's memory which is shared across multiple processes. Hence I do not want use DOM because this will use lot of memory to load these huge XML.
If you need to alter a particular value like text content /tag name etc. STAX might help. It would also help in removing few elements using createFilteredReader
Below code renames Name
to AuthorName
and adds a comment
public class StAx {
public static void main(String[] args) throws FileNotFoundException,
XMLStreamException {
String filename = "HelloWorld.xml";
try (InputStream in = new FileInputStream(filename);
OutputStream out = System.out;) {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLEventFactory ef = XMLEventFactory.newInstance();
XMLEventReader reader = factory.createXMLEventReader(filename, in);
XMLEventWriter writer = xof.createXMLEventWriter(out);
while (reader.hasNext()) {
XMLEvent event = (XMLEvent) reader.next();
if (event.isCharacters()) {
String data = event.asCharacters().getData();
if (data.contains("Hello")) {
String replace = data.replace("Hello", "Oh");
event = ef.createCharacters(replace);
}
writer.add(event);
} else if (event.isStartElement()) {
StartElement s = event.asStartElement();
String tagName = s.getName().getLocalPart();
if (tagName.equals("Name")) {
String newName = "Author" + tagName;
event = ef.createStartElement(new QName(newName), null,
null);
writer.add(event);
writer.add(ef.createCharacters("\n "));
event = ef.createComment("auto generated comment");
writer.add(event);
} else {
writer.add(event);
}
} else {
writer.add(event);
}
}
writer.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Input
<?xml version="1.0"?>
<BookCatalogue>
<Book>
<Title>HelloLord</Title>
<Name>
<first>New</first>
<last>Earth</last>
</Name>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>HelloWord</Title>
<Name>
<first>New</first>
<last>Moon</last>
</Name>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
Output
<?xml version="1.0"?><BookCatalogue>
<Book>
<Title>OhLord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Earth</last>
</AuthorName>
<ISBN>12345</ISBN>
</Book>
<Book>
<Title>OhWord</Title>
<AuthorName>
<!--auto generated comment-->
<first>New</first>
<last>Moon</last>
</AuthorName>
<ISBN>12346</ISBN>
</Book>
</BookCatalogue>
As you can see things gets really complicated when modification is much more than this like swapping two nodes deleting one node based on state of few other node : delete All Books with price more than average price
Best solution in this case is to produce resulting xml using xslt transformation