I am currently using HtmlAgilityPack with a console application to scrape a website. Since the html is encoded (it returns encoded characters like '
) I have to decode before I save the content to my database.
Is there a way to decode the returned html using HtmlAgilityPack without having to use HttpUtility.HtmlDecode? I want to avoid adding System.Web to my console application if possible.
The Html Agility Pack is equiped with a utility class called HtmlEntity
. It has a static method with the following signature:
/// <summary>
/// Replace known entities by characters.
/// </summary>
/// <param name="text">The source text.</param>
/// <returns>The result text.</returns>
public static string DeEntitize(string text)
It supports well-known entities (like
) and encoded characters such as '
as well.