Select elements with attribute data-url using HTMLAgilityPack

Joe Black picture Joe Black · Jul 10, 2012 · Viewed 10.1k times · Source

I'm writing a little Download-Roboter, that is searching for links in lower layers for it self.

What i need to find are all links in an html-Page (the links to .jpg files as well as the links to .pgn, .pdf, .html,.... - files)

I´m using the html-agilitypack to find all a-href links.

Sample code:

foreach (HtmlNode link in htmlDocument.DocumentNode.SelectNodes("//a[@href]"))
{
    HtmlAttribute attribute = link.Attributes["href"];
    links.Add(attribute.Value);
}

But i want to find the data-urls as well.

What XPath-syntax do i have to use to find data-urls. An example data-url in an htmlcode:

    <div class="cbreplay" data-url="2012\edmonton\partien.pgn"></div>

I need the "2012\edmonton\partien.pgn" out of this example. How can i realize this with XPath syntax?

Best greetings, if i made some bad mistakes, tell me. This is my first question ever.

Answer

dash picture dash · Jul 10, 2012

The following should do what you want:

foreach (HtmlNode divNode in htmlDocument.DocumentNode.SelectNodes("//div[@data-url]"))
{
    HtmlAttribute attribute = divNode.Attributes["data-url"];
    links.Add(attribute.Value);
}

Effectively, the statement //div[@data-url] should select all nodes with a data-url attribute. We then pull out this attribute.

If there are nodes other than divs with this attribute, then //*[@data-url] should do the trick.