BeautifulSoup and ASP.NET/C#

user300981 picture user300981 · Jul 28, 2010 · Viewed 14.6k times · Source

Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)? Is there a BeautifulSoup alternative or a port that works nicely with ASP.NET/C#

The intent of planning to use the library is to extract readable text from any random URL.

Thanks

Answer

Colin Pickard picture Colin Pickard · Jul 28, 2010

Html Agility Pack is a similar project, but for C# and .NET


EDIT:

To extract all readable text:

document.DocumentNode.InnerText

Note that this will return the text content of <script> tags.

To fix that, you can remove all of the <script> tags, like this:

foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
    script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
    style.Remove();

(Credit: SLaks)