I want to write some text that contains whitespace characters such as newline
and tab
into an xml file so I use
Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));
but when I read this back in using
Node vs = xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();
I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.
How can I preserve the newlines?
Thanks!
I don't know how you parse and write your document, but here's an enhanced code example based on yours:
// creating the document in-memory
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element element = xmldoc.createElement("TestElement");
xmldoc.appendChild(element);
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));
// serializing the xml to a string
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
String str = writer.writeToString(xmldoc);
// printing the xml for verification of whitespace in cdata
System.out.println("--- XML ---");
System.out.println(str);
// de-serializing the xml from the string
final Charset charset = Charset.forName("utf-16");
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);
Node vs = xmldoc2.getElementsByTagName("TestElement").item(0);
final Node child = vs.getFirstChild();
String x = child.getNodeValue();
// print the value, yay!
System.out.println("--- Node Text ---");
System.out.println(x);
The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:
--- XML ---
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text ---
first line
second line