How to get the raw HTML of a node

icn picture icn · Jun 23, 2012 · Viewed 13.8k times · Source

I am using Nokogiri to analyze some HTML, but, I don't know how get the raw HTML inside a node.

For example, given:

<tr class="tableX">
  <td align="center">
    <font size="2"><a href="javascript:open('9746')">9746</a></font>
  </td>
  <td align="center">
    <font size="2">2012-06-26</font>
  </td>
</tr>

When I use this XPath selector:

doc = Nokogiri::HTML(html)

nodes = doc.search("//tr[@class='tablebX']")

nodes.each do |node|
   node.text # or node.content
end

The results from node.text and node.content are:

9746
2012-06-26

I want to get all raw HTML inside the tr block, which, in this case, is:

<td align="center">
  <font size="2"><a href="javascript:open('9746')">9746</a></font>
</td>
<td align="center">
  <font size="2">2012-06-26</font>
</td>

What's the proper way to do that?

Answer

Dave Newton picture Dave Newton · Jun 23, 2012

Use node.to_s, or just node:

nodes = doc.search("//tr[@class='tablebX']")
nodes.each do |node|
   puts node.to_s
   puts '-'*40
end

With additional sanity-check HTML (yours, doubled, with a tr of a different class in the middle) I get:

<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font> 
            </td>
            <td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------
<tr class="tableX">
<td align="center">
<font size="2"><a href="javascript:open('9746')">9746</a></font> 
            </td>
            <td align="center"><font size="2">2012-06-26</font></td>
</tr>
----------------------------------------