I'm using Xerces to parse my XML document. The issue is that XML escaped characters like  
appear in characters()
method as non-escaped ones. I need to get escaped characters inside characters()
method as is.
Thanks.
UPD: Tried to override resolveEntity()
method in my DefaultHandler
's descendant. Can see from debug that it's set as entity resolver to XML reader but code from overridden method is not invoked.
I think your solution is not too bad: a few lines of code to do exactly what you want.
The problem is that startEntity
and endEntity
methods are not provided by ContentHandler
interface, so you have to write a LexicalHandler
which works in combination with your ContentHandler
.
Usually, the use of an XMLFilter
is more elegant, but you have to work with entity, so you still should write a LexicalHandler
. Take a look here for an introduction to the use of SAX filters.
I'd like to show you a way, very similar to yours, which allows you to separate filtering operations (wrapping & to &
for instance) from output operations (or something else). I've written my own XMLFilter
based on XMLFilterImpl
which also implements LexicalHandler
interface. This filter contains only the code related to entites escape/unescape.
public class XMLFilterEntityImpl extends XMLFilterImpl implements
LexicalHandler {
private String currentEntity = null;
public XMLFilterEntityImpl(XMLReader reader)
throws SAXNotRecognizedException, SAXNotSupportedException {
super(reader);
setProperty("http://xml.org/sax/properties/lexical-handler", this);
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currentEntity == null) {
super.characters(ch, start, length);
return;
}
String entity = "&" + currentEntity + ";";
super.characters(entity.toCharArray(), 0, entity.length());
currentEntity = null;
}
@Override
public void startEntity(String name) throws SAXException {
currentEntity = name;
}
@Override
public void endEntity(String name) throws SAXException {
}
@Override
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
}
@Override
public void endDTD() throws SAXException {
}
@Override
public void startCDATA() throws SAXException {
}
@Override
public void endCDATA() throws SAXException {
}
@Override
public void comment(char[] ch, int start, int length) throws SAXException {
}
}
And this is my main, with a DefaultHandler
as ContentHandler
which receives the entity as it is according to the filter code:
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
DefaultHandler defaultHandler = new DefaultHandler() {
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
//This method receives the entity as is
System.out.println(new String(ch, start, length));
}
};
XMLFilter xmlFilter = new XMLFilterEntityImpl(XMLReaderFactory.createXMLReader());
xmlFilter.setContentHandler(defaultHandler);
String xml = "<html><head><title>title</title></head><body>&</body></html>";
xmlFilter.parse(new InputSource(new StringReader(xml)));
}
And this is my output:
title
&
Probably you don't like it, anyway this is an alternative solution.
I'm sorry, but with SaxParser
I think you don't have a more elegant way.
You should also consider switching to StaxParser
: it's very easy to do what you want with XMLInputFactory.IS_REPLACING_ENTITY_REFERENCE
set to false. If you like this solution, you should take a look here.