I'm trying to parse an RSS feed from Monster on Android v.17 using this URL:
http://rss.jobsearch.monster.com/rssquery.ashx?q=java
To get the content I'm using HttpUrlConnection in the following fashion
this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(url.openStream());
What comes back is as far as I can say (and I verified it too) a legit RSS
Cache-Control:private
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:5958
Content-Type:text/xml
Date:Wed, 06 Mar 2013 17:15:20 GMT
P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET
It starts like this (click the URL above if you want to see complete XML):
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Monster Job Search Results java</title>
<description>RSS Feed for Monster Job Search</description>
<link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>
But when I attempt to parse it:
final XmlPullParser xpp = getPullParser();
xpp.setInput(is);
for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }
The code immediately chokes on type = xpp.next()
with the following Exception
03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException:
Unexpected token (position:TEXT @1:2 in java.io.InputStreamReader@414b4538)
Which actually means it cannot process 2nd char at line 1 <?xml version="1.0" encoding="utf-8"?>
Here are the offending lines in the KXmlParser.java (425-426). The type == TEXT evaluates to true
if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) {
throw new XmlPullParserException("Unexpected token", this, null);
}
Any help? I did try to set parser to XmlPullParser.FEATURE_PROCESS_DOCDECL = false
but that didn't help
I did research this on the web and here and can't find anything that helps
The reason you are getting the error is that the xml file doesn't actually start with <?xml version="1.0" encoding="utf-8"?>
. It starts with three special bytes EF BB BF
which are Byte order mark
.
InputStreamReader
doesn't handle these bytes automatically, so you have to handle them manually. The simplest way to it is to use BOMInpustStream
available in Commons IO
library:
this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));
I've checked the code above and it works well for me.