GetElementsByTagName in Htmlagilitypack

Ali picture Ali · Apr 21, 2012 · Viewed 32.7k times · Source

How do I select an element for e.g. textbox if I don't know its id?

If I know its id then I can simply write:

HtmlAgilityPack.HtmlNode node = doc.GetElementbyId(id);

But I don't know textbox's ID and I can't find GetElementsByTagName method in HtmlagilityPack which is available in webbrowser control. In web browser control I could have simply written:

HtmlElementCollection elements = browser[i].Document.GetElementsByTagName("form");
foreach (HtmlElement currentElement in elements)
{

}

EDIT

Here is the HTML form I am talking about

<form id="searchform" method="get" action="/test.php">
<input name="sometext" type="text">
</form>

Please note I don't know the ID of form. And there can be several forms on same page. The only thing I know is "sometext" and I want to get this element using just this name. So I guess I will have to parse all forms one by one and then find this name "sometext" but how do I do that?

Answer

jessehouwing picture jessehouwing · Apr 22, 2012

If you're looking for the tag by its tagName (such as form for <form name="someForm">), then you can use:

var forms = document.DocumentNode.Descendants("form");

If you're looking for the tag by its name property (such as someForm for <form name="someForm">, then you can use:

var forms = document.DocumentNode.Descendants().Where(node => node.Name == "formName");

For the last one you could create a simple extension method:

public static class HtmlNodeExtensions
{
    public static IEnumerable<HtmlNode> GetElementsByName(this HtmlNode parent, string name)
    {
        return parent.Descendants().Where(node => node.Name == name);
    }

    public static IEnumerable<HtmlNode> GetElementsByTagName(this HtmlNode parent, string name)
    {
        return parent.Descendants(name);
    }
}

Note: You can also use SelectNodes and XPath to query your document:

var nodes = doc.DocumentNode.SelectNodes("//form//input");

Would give you all inputs on the page that are in a form tag.

var nodes = doc.DocumentNode.SelectNodes("//form[1]//input");

Would give you all the inputs of the first form on the page