Scrape Table from web page in c#

Michael picture Michael · Nov 15, 2011 · Viewed 8k times · Source

What is the best approach to build a function to scrape a html table on a webpage into a variable.

I want to be able to pass it some unique identifier (like table ID or something) and it will return all the data into something like a DataTable.

Answer

BrokenGlass picture BrokenGlass · Nov 15, 2011

You can use HtmlAgilityPack to parse the HTML and extract the table data.

With HAP now supporting Linq you could start with something like this:

HtmlDocument doc = ...
var myTable = doc.DocumentNode
                 .Descendants("table")
                 .Where(t =>t.Attributes["id"].Value == someTableId)
                 .FirstOrDefault();

if(myTable != null)
{
    ///further parsing here
}