Strip whitespace and newlines from XML in Java

Jannis Ioannou picture Jannis Ioannou · Jul 28, 2012 · Viewed 45.1k times · Source

Using Java, I would like to take a document in the following format:

<tag1>
 <tag2>
    <![CDATA[  Some data ]]>
 </tag2>
</tag1>

and convert it to:

<tag1><tag2><![CDATA[  Some data ]]></tag2></tag1>

I tried the following, but it isn't giving me the result I am expecting:

DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
dbfac.setIgnoringElementContentWhitespace(true);
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new FileInputStream("/tmp/test.xml"));

Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
System.out.println(out.toString());

Answer

Wolfgang picture Wolfgang · Oct 31, 2012

Working solution following instructions in the question's comments by @Luiggi Mendoza.

public static String trim(String input) {
    BufferedReader reader = new BufferedReader(new StringReader(input));
    StringBuffer result = new StringBuffer();
    try {
        String line;
        while ( (line = reader.readLine() ) != null)
            result.append(line.trim());
        return result.toString();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}