Parsing HTML with CSQuery

bluewonder picture bluewonder · Feb 28, 2014 · Viewed 11.4k times · Source

How can I retrieve the value from a div tag via the ID using CSQuery?

For example,

<h3>
    <div id='type'>
        Room 1
    </div>
    <div id='price'>
        145
    </div>
</h3>

In this case I'd like to get the content inside type and price.

Answer

hutchonoid picture hutchonoid · Feb 28, 2014

Ok, here is how you do this with a full working example.

Html

This includes your invalid/duplicate id html which you have no control over

var html = @"<h3>
            <div id='lib_presta'>
                Chambre standard 1 pers du <span class=''>03/03/2014</span>  au <span class=''>05/03/2014 </span>
            </div>
            <div id='prix_presta'>
                127.76 &euro;
            </div>
        </h3><h3>
            <div id='lib_presta'>
                Chambre standard 2 pers du <span class=''>03/03/2014</span>  au <span class=''>05/03/2014 </span>
            </div>
            <div id='prix_presta'>
                227.76 &euro;
            </div>
        </h3>";

C# Code

This loads the dom elements by their id's into two lists of descriptions and prices. It then projects them into a list of HotelAvailability objects using the key values of both collections as the HotelName and Price properties.

        CQ dom = html;

        var libs = dom["#lib_presta"];
        var prixs = dom["#prix_presta"];

        var list = libs.Zip(prixs, (k, v) => new { k, v })
          .Select(h => new HotelAvailablity { HotelName = h.k.InnerText.Trim(), Price = h.v.InnerText.Trim() });

Screen grab

Run the above in a console app to test it.