Jsoup find element with specific text

tbag picture tbag · Aug 27, 2014 · Viewed 25.4k times · Source

I want to select an element with specific text from the HTML using JSoup. The html is

<td style="vertical-align:bottom;text-align:center;width:15%">
<div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span>
    <div><b>Pantry/Catering</b>
        <div>
            <div style="color:#00700B;">&#10003;&nbsp;Pantry Car Avbl
                <br />&#10003;&nbsp;Catering Avbl</div>
        </div>
        <div>
            <div><span>Dinner is served after departure from NZM on 1st day.;</span>...
                <br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a>
            </div>
            <div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &amp;Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee &amp; lime juice</div>
        </div>
    </div><span class="corners-bottom-subtab"><span></span></span>
</div>

I want to find the div element containing the text "Pantry/Catering". I tried

doc.select("div:contains(Pantry/Catering)").first();

But this doesnt seem to work. How can I get this element using Jsoup?

Answer

Spectre picture Spectre · Aug 27, 2014

When I run your code it selects the outer div, while I'm presuming what your looking for is the inner div. The documentation says that it selects the "elements that contains the specified text". In this simple html:

<div><div><b>Pantry/Catering</b></div></div>

The selector div:contains(Pantry/Catering) matches twice because both contain the text 'Pantry/Catering':

<!-- First Match -->
<div><div><b>Pantry/Catering</b></div></div>

<!-- Second Match -->
<div><b>Pantry/Catering</b></div>

The matches are always in that order because jsoup matches from the outside. Therefore .first() always matches the outer div. To extract the inner div you could use .get(1).

Extracting the inner div in full:

doc.select("div:contains(Pantry/Catering)").get(1)