Monday, March 4, 2019

C#: Parsing HTML Table and Loading HTML Webpage using Html Agility Pack


Include HTML Agility Pack in your application using nuget. To install it in your project, type the following command in the Package Manager Console.

> Install-Package HtmlAgilityPack
 
After adding the reference via Nuget, you need to include the reference in your page using the following.

> using HtmlAgilityPack;
 
Below function will convert webpage HTML table to C# readable code, just need to pass table class name and page URL.

public List<List<string>> ScrapHtmlTable(string className, string pageURL)
{
    HtmlWeb web = new HtmlWeb();
    HtmlDocument document = web.Load(pageURL);
    List<List<string>> parsedTbl = 
      document.DocumentNode.SelectSingleNode("//table[@class='" + className + "']")
      .Descendants("tr")
      .Skip(1) //To Skip Table Header Row
      .Where(tr => tr.Elements("td").Count() > 1)
      .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
      .ToList();

    return parsedTbl;
}
 
 
Invoking function signature:

ScrapHtmlTable("className1 className2", "https://www.abc.xz");

0 comments:

Post a Comment

Twitter Delicious Facebook Digg Stumbleupon Favorites More