I want to use the HTML agility pack to parse image and href links from a HTML page,but I just don't know much about XML or XPath.Though having looking up help documents in many web sites,I just can't solve the problem.In addition,I use C# in VisualStudio 2005.And I just can't speak English fluently,so,I will give my sincere thanks to the one can write some helpful codes.
The first example on the home page does something very similar, but consider:
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm"); // would need doc.LoadHtml(htmlSource) if it is not a file
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
string href = link["href"].Value;
// store href somewhere
}
So you can imagine that for img@src, just replace each a
with img
, and href
with src
.
You might even be able to simplify to:
foreach(HtmlNode node in doc.DocumentElement
.SelectNodes("//a/@href | //img/@src")
{
list.Add(node.Value);
}
For relative url handling, look at the Uri
class.