XPath select all text content for a <div> except for a specific tag <h5>

bslima picture bslima · Feb 27, 2013 · Viewed 16.7k times · Source

I searched and tried several solutions for this problem but none of them worked: I have this HTML

<div class="detalhes_colunadados">
   <div class="detalhescolunadados_blocos">
     <h5>Descrição completa</h5>
    Sala de estar/jantar,2 vagas de garagem cobertas.<br>
    </div>
    <div class="detalhescolunadados_blocos">
      <h5>Valores</h5>
            Venda: R$ 600.000,00<br>
          Condomínio: R$ 660,00<br>
    </div>
</div>

And wanna to extract by XPath only the text content in the first div class="detalhescolunadados_blocos" that are not h5 tags.

I tried: //div[@class='detalhescolunadados_blocos']/[1]/*[not(self::h5)]

Answer

nwellnhof picture nwellnhof · Feb 27, 2013

Try the following XPath expression:

//div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]

This will return:

$ xmllint --html --shell so.html
/ > xpath //div[@class='detalhescolunadados_blocos'][1]//text()[not(ancestor::h5)]    
Object is a Node Set :
Set contains 2 nodes:
1  TEXT
    content=      
2  TEXT
    content=     Sala de estar/jantar,2 vagas de gar...