HtmlAgilityPack replace node

Omar picture Omar · Jul 21, 2011 · Viewed 23.5k times · Source

I want to replace a node with a new node. How can I get the exact position of the node and do a complete replace?

I've tried the following, but I can't figured out how to get the index of the node or which parent node to call ReplaceChild() on.

string html = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

var bolds = document.DocumentNode.Descendants().Where(item => item.Name == "b");

foreach (var item in bolds)
{

    string newNodeHtml = GenerateNewNodeHtml();
    HtmlNode newNode = new HtmlNode(HtmlNodeType.Text, document, ?);
    item.ParentNode.ReplaceChild( )
}

Answer

Jeff Mercado picture Jeff Mercado · Jul 22, 2011

To create a new node, use the HtmlNode.CreateNode() factory method, do not use the constructor directly.

This code should work out for you:

var htmlStr = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlStr);

var query = doc.DocumentNode.Descendants("b");
foreach (var item in query.ToList())
{
    var newNodeStr = "<foo>bar</foo>";
    var newNode = HtmlNode.CreateNode(newNodeStr);
    item.ParentNode.ReplaceChild(newNode, item);
}

Note that we need to call ToList() on the query, we will be modifying the document so it would fail if we don't.


If you wish to replace with this string:

"some text <b>node</b> <strong>another node</strong>"

The problem is that it is no longer a single node but a series of nodes. You can parse it fine using HtmlNode.CreateNode() but in the end, you're only referencing the first node of the sequence. You would need to replace using the parent node.

var htmlStr = "<b>bold_one</b><strong>strong</strong><b>bold_two</b>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlStr);

var query = doc.DocumentNode.Descendants("b");
foreach (var item in query.ToList())
{
    var newNodesStr = "some text <b>node</b> <strong>another node</strong>";
    var newHeadNode = HtmlNode.CreateNode(newNodesStr);
    item.ParentNode.ReplaceChild(newHeadNode.ParentNode, item);
}