I am using Nokogiri to analyze some HTML, but, I don't know how get the raw HTML inside a node.
For example, given:
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center">
<font size="2">2012-06-26</font>
</td>
</tr>
When I use this XPath selector:
doc = Nokogiri::HTML(html)
nodes = doc.search("//tr[@class='tablebX']")
nodes.each do |node|
node.text # or node.content
end
The results from node.text
and node.content
are:
9746
2012-06-26
I want to get all raw HTML inside the tr
block, which, in this case, is:
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center">
<font size="2">2012-06-26</font>
</td>
What's the proper way to do that?
Use node.to_s
, or just node
:
nodes = doc.search("//tr[@class='tablebX']")
nodes.each do |node|
puts node.to_s
puts '-'*40
end
With additional sanity-check HTML (yours, doubled, with a tr
of a different class in the middle) I get:
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------