I'm trying to parse Rss2, Atom feeds using SyndicationFeedFormatter and SyndicationFeed objects. But I'm getting XmlExceptions while parsing DateTime field like pubDate and/or lastBuildDate.
Wed, 24 Feb 2010 18:56:04 GMT+00:00 does not work
Wed, 24 Feb 2010 18:56:04 GMT works
So, it's throwing due to the timezone field.
As a workaround, for familiar feeds I would manually fix those DateTime nodes - by catching the XmlException, loading the Rss into an XmlDocument, fixing those nodes' value, creating a new XmlReader and then returning the formatter from this new XmlReader object (code not shown). But for this approach to work, I need to know beforehand which nodes cause exception.
SyndicationFeedFormatter syndicationFeedFormatter = null;
XmlReaderSettings settings = new XmlReaderSettings();
using (XmlReader reader = XmlReader.Create(url, settings))
{
try
{
syndicationFeedFormatter = SyndicationFormatterFactory.CreateFeedFormatter(reader);
syndicationFeedFormatter.ReadFrom(reader);
}
catch (XmlException xexp)
{
// fix those datetime nodes with exceptions and read again.
}
return syndicationFeedFormatter;
}
rss feed: http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=test&cf=all&output=rss
exception detials:
XmlException Error in line 1 position 376. An error was encountered when parsing a DateTime value in the XML.
at System.ServiceModel.Syndication.Rss20FeedFormatter.DateFromString(String dateTimeString, XmlReader reader)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result) at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader) at ... cs:line 171
<rss version="2.0">
<channel>
...
<pubDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</pubDate>
<lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate> <-----exception
...
<item>
...
<pubDate>Wed, 24 Feb 2010 16:17:50 GMT+00:00</pubDate>
<lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate>
</item>
...
</channel>
</rss>
Is there a better way to achieve this? Please help. Thanks.
Here is my hacky workaround for reading Google News RSS feeds.
string xml;
using (WebClient webClient = new WebClient())
{
xml = Encoding.UTF8.GetString(webClient.DownloadData(url));
}
xml = xml.Replace("+00:00", "");
byte[] bytes = System.Text.UTF8Encoding.ASCII.GetBytes(xml);
XmlReader reader = XmlReader.Create(new MemoryStream(bytes));
SyndicationFeed feed = SyndicationFeed.Load(reader);