I have not found any documentation nor tutorial for that. Does anything like that exist?
doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')
The code above will get me any table
, anywhere, that has a tbody
child with the attribute id
equal to "threadbits_forum_251". But why does it start with double //
? Why there is /tr
at the end? See "Ruby Nokogiri Parsing HTML table II" for more details.
Can anybody tell me how to extract href
, id
, alt
, src
, etc., using Nokogiri?
td[3]/div[1]/a/text()' <--- extracts text
How can I extract other things?
Seems you need to read a XPath Tutorial
Your //table/tbody[@id="threadbits_forum_251"]/tr
expression means:
//
- Anywhere in your XML documenttable/tbody
- take a table element with a tbody child[@id="threadbits_forum_251"]
- where id attribute are equals to "threadbits_forum_251"tr
- and take its tr
elementsSo, basically, you need to know:
@
[]
bracketsIf I correcly understood that API, you can go with doc.xpath("td[3]/div[1]/a")["href"]
, or td[3]/div[1]/a/@href
if there is just one <a>
element.