How do I use XPath in Nokogiri?

Radek picture Radek · Jan 17, 2010 · Viewed 56.7k times · Source

I have not found any documentation nor tutorial for that. Does anything like that exist?


doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')

The code above will get me any table, anywhere, that has a tbody child with the attribute id equal to "threadbits_forum_251". But why does it start with double //? Why there is /tr at the end? See "Ruby Nokogiri Parsing HTML table II" for more details.


Can anybody tell me how to extract href, id, alt, src, etc., using Nokogiri?

td[3]/div[1]/a/text()' <--- extracts text

How can I extract other things?

Answer

Rubens Farias picture Rubens Farias · Jan 17, 2010

Seems you need to read a XPath Tutorial

Your //table/tbody[@id="threadbits_forum_251"]/tr expression means:

  • // - Anywhere in your XML document
  • table/tbody - take a table element with a tbody child
  • [@id="threadbits_forum_251"] - where id attribute are equals to "threadbits_forum_251"
  • tr - and take its tr elements

So, basically, you need to know:

  • attributes begins with @
  • conditions go inside [] brackets

If I correcly understood that API, you can go with doc.xpath("td[3]/div[1]/a")["href"], or td[3]/div[1]/a/@href if there is just one <a> element.