Android org.xmlpull.v1.XmlPullParserException while parsing XML

Aamir picture Aamir · Apr 8, 2012 · Viewed 10.9k times · Source

I have a situation where i call a web service and it returns me some HTML in an XML envelop. like:

<xml version="1.0" cache="false">
<head/>
<body>
<table>
<tr>
   <td>
        <a href="link-to-prev-post">
           <text color="red"><< Prev</text>
        </a>
   </td>
   <td>
        <a href="link-to-next-post">
           <text color="red">| Next >></text>
        </a>
   </td>
</tr>
</table>
</body>
</xml>

I have to retrieve the link-to-prev-post & link-to-next-post links.. so i can get more data through these links.

I am using XmlPullParser to parse the above provided XML/HTML. To get the links for next/prev items, i am doing as follows:

if (xmlNodeName.equalsIgnoreCase("a")) {
                link = parser.getAttributeValue(null, "href");

            } else if (xmlNodeName.equalsIgnoreCase("text")) {
                color = parser.getAttributeValue(null, "color");

                if (color.equalsIgnoreCase("red") && parser.getEventType() == XmlPullParser.START_TAG) {
                        // check for next/prev blog entries links
                        // but this parser.nextText() throws XmlPullParserException
                        // i think because the nextText() returns << Prev which the parser considers to be wrong
                        String innerText = parser.nextText();
                        if (innerText.contains("<< Prev")) {
                            blog.setPrevBlogItemsUrl(link);                             
                        } else if (innerText.contains("Next >>")) {
                            blog.setNextBlogItemsUrl(link);
                        }
                    }

                    link = null;
                }
            }

It throws XmlPullParserException on execution of parser.nextText() ... and the value of the text element at this time is << Prev .. i think it misunderstands this value with start tag because of the presence of << in text..

LogCat detail is:

04-08 18:32:09.827: W/System.err(688): org.xmlpull.v1.XmlPullParserException: precondition: START_TAG (position:END_TAG </text>@9:2535 in java.io.InputStreamReader@44c6d0d8) 
04-08 18:32:09.827: W/System.err(688):  at org.kxml2.io.KXmlParser.exception(KXmlParser.java:245)
04-08 18:32:09.827: W/System.err(688):  at org.kxml2.io.KXmlParser.nextText(KXmlParser.java:1382)
04-08 18:32:09.827: W/System.err(688):  at utilities.XMLParserHelper.parseBlogEntries(XMLParserHelper.java:139)
04-08 18:32:09.827: W/System.err(688):  at serviceclients.PlayerSummaryAsyncTask.doInBackground(PlayerSummaryAsyncTask.java:68)
04-08 18:32:09.827: W/System.err(688):  at serviceclients.PlayerSummaryAsyncTask.doInBackground(PlayerSummaryAsyncTask.java:1)
04-08 18:32:09.836: W/System.err(688):  at android.os.AsyncTask$2.call(AsyncTask.java:185)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.FutureTask.run(FutureTask.java:137)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1068)
04-08 18:32:09.836: W/System.err(688):  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:561)
04-08 18:32:09.836: W/System.err(688):  at java.lang.Thread.run(Thread.java:1096)

I hope i have clarified my problem.

Solution

Isnpired by Martin's approach of converting the received data first to string, i managed my problem in a kind of mixed approach.

  1. Convert the received InputStream's value to string and replaced the erroneous characters with * (or whatever you wish) : as follows

    InputStreamReader isr = new InputStreamReader(serviceReturnedStream);
    
    BufferedReader br = new BufferedReader(isr);
    StringBuilder xmlAsString = new StringBuilder(512);
    String line;
    try {
        while ((line = br.readLine()) != null) {
            xmlAsString.append(line.replace("<<", "*").replace(">>", "*"));
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    
  2. Now i have a string which contains correct XML data (for my case), so just use the normal XmlPullParser to parse it instead of manually parsing it myself:

    XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
    
    factory.setNamespaceAware(false);
    
    XmlPullParser parser = factory.newPullParser();
    parser.setInput(new StringReader(xmlAsString.toString()));
    

Hope this helps someone!

Answer

Martin Nordholts picture Martin Nordholts · Apr 11, 2012

Yes, the exception is probably thrown because that is invalid XML as per section 2.4 Character Data and Markup in the XML 1.0 specification:

[...] the left angle bracket (<) MUST NOT appear in [its] literal form, [...]

If you put that XML in Eclipse, Eclipse will complain about the XML being invalid. If you are able to fix the web service, you should fix the generated XML, either by using entity references such as &lt; or by using CDATA.

If you have no power over the web service, I think the easiest will be to parse that manually with some custom code, perhaps using regular expressions, depending on how relaxed requirements of generality you have.

Example Code

Here's how you could parse the XML file above. Note that you probably want to improve this code to make it more general, but you should have something to start with at least:

    // Read the XML into a StringBuilder so we can get get a Matcher for the
    // whole XML
    InputStream xmlResponseInputStream = // Get InputStream to XML somehow
    InputStreamReader isr = new InputStreamReader(xmlResponseInputStream);
    BufferedReader br = new BufferedReader(isr);
    StringBuilder xmlAsString = new StringBuilder(512);
    String line;
    try {
        while ((line = br.readLine()) != null) {
            xmlAsString.append(line);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

    // Look for links using a regex. Assume the first link is "Prev" and the
    // next link is "Next"
    Pattern hrefRegex = Pattern.compile("<a href=\"([^\"]*)\">");
    Matcher m = hrefRegex.matcher(xmlAsString);
    String linkToPrevPost = null;
    String linkToNextPost = null;
    while (m.find()) {
        String hrefValue = m.group(1);
        if (linkToPrevPost == null) {
            linkToPrevPost = hrefValue;
        } else {
            linkToNextPost = hrefValue;
        }
    }

    Log.i("Example", "'Prev' link = " + linkToPrevPost + 
            " 'Next' link = " + linkToNextPost);

With your XML file, the output to logcat will be

I/Example (12399): 'Prev' link = link-to-prev-post 'Next' link = link-to-next-post