I am trying to remove unnecessary content from HTML. Specifically I want to remove comments. I found a pretty good solution (Grabbing meta-tags and comments using HTML Agility Pack) however the DOCTYPE is treated as a comment and therefore removed along with the comments. How can I improve the code below to make sure the DOCTYPE is preserved?
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlContent);
var nodes = htmlDoc.DocumentNode.SelectNodes("//comment()");
if (nodes != null)
{
foreach (HtmlNode comment in nodes)
{
comment.ParentNode.RemoveChild(comment);
}
}
doc.DocumentNode.Descendants()
.Where(n => n.NodeType == HtmlAgilityPack.HtmlNodeType.Comment)
.ToList()
.ForEach(n => n.Remove());
this will strip off all comments from the document