How to deal with invalid characters in a WS output when using CXF?

Elias Dorneles picture Elias Dorneles · Mar 14, 2012 · Viewed 26.3k times · Source

I'm using Spring, CXF and Hibernate to build a WebService that perform search queries on a foreign database that I have read-only access.

The problem is that some entries in the database have strange characters (0x2) in text fields, and it seems that CXF or the library (Aegis?) that it uses to process/serialize the objects returned from the Hibernate session can't deal with it:

org.apache.cxf.aegis.DatabindingException: Error writing document.. Nested exception is com.ctc.wstx.exc.WstxIOException: Invalid white space character (0x2) in text to output (in xml 1.1, could output as a character entity)

How do I get around that? Ideally, I could just remove those characters, since they don't matter for my output... Thanks!

Answer

nDijax picture nDijax · Nov 13, 2012
/**
* From xml spec valid chars:<br>
* #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]<br>
* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.<br>
* @param text The String to clean
* @param replacement The string to be substituted for each match
* @return The resulting String
*/
public static String CleanInvalidXmlChars(String text, String replacement) {
    String re = "[^\u0009\r\n\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF]";
    return text.replaceAll(re, replacement);
}

source: http://www.theplancollection.com/house-plan-related-articles/hexadecimal-value-invalid-characterheplancollection.com/house-plan-related-articles/hexadecimal-value-invalid-character