I'm pulling the source of a website. I then want to extract a specific part of it. My intention is to do this with LINQ-to-XML.
However, I get errors when I parse the source:
XElement source = XElement.Load(reader);
The problem seems to be references to namespaces I don't have. I get the error: 'addthis' is an undeclared prefix. Line 130, position 51.
due to this line:
<div class="addthis_toolbox addthis_pill_combo" addthis:url="http://www.foo.com/foo">
And if I delete that one, other occur.
Thing is, I only care about one piece of this XML file - I don't need to be able to parse the whole file. I just want it in an XElement so I can find that one piece of it. Is there a way for me to hack around the parsing error? And I need a generic solution - I want to parse the file regardless of ANY undeclared prefix
errors.
Thanks
This XML is not valid.
In order to use a namespace prefix (such as addthis:
), the namespace must be declared, by writing xmlns:addthis="some URI"
.
In general, you shouldn't parse HTML using an XML parser, since HTML is likely to be invalid XML, for this reason and a number of other reasons (undeclared entities, unescaped JS, unclosed tags).
Instead, use HTML Agility Pack.