SAX parsing and special characters

giorgos_412 picture giorgos_412 · Nov 11, 2012 · Viewed 8k times · Source

I want to parse some data from an xml file using SAX parser. My xml is as follows:

<categories>
 <cat>Pies &amp; past</cat>
 <cat>Fruits</cat>
</categories>

In order to parse this data I extend DefaultHandler.

The output after parsing is:

cat 1 = Pies

cat 2 = &

cat 3 = past

cat 4 = Fruits

Why is this happening instead of getting:

cat 1 = Pies & past

cat 2 = Fruits

Answer

Ted Hopp picture Ted Hopp · Nov 11, 2012

My guess is that you are treating each call to characters as delivering the complete text for a cat element. You should code your handler so that successive calls to characters accumulate the text, and you only capture it on the endElement event:

public class CatHandler extends DefaultHandler {
    private StringBuilder chars = new StringBuilder();

    public void startElement(String uri, String lName, String qName, Attributes a)
    {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            chars.setLength(0);
        } else . . .
    }

    public void endElement(String uri, String lName, String qName) {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            String catName = chars.toString();
            // do something with cat name
        } else . . .
    }

    public void characters(char[] ch, int start, int length) {
        chars.append(ch, start, length);
    }