How to get text & Other tags between specific tags using Jericho HTML parser?

insomiac picture insomiac · Apr 11, 2011 · Viewed 8.4k times · Source

I have a HTML file which contains a specific tag, e.g. <TABLE cellspacing=0> and the end tag is </TABLE>. Now I want to get everything between those tags. I am using Jericho HTML parser in Java to parse the HTML. Is it possible to get the text & other tags between specific tags in Jericho parser?

For example:

<TABLE  cellspacing=0>    
  <tr><td>HELLO</td>  
  <td>How are you</td></tr>
</TABLE>

Answer:

<tr><td>HELLO</td>  
<td>How are you</td></tr> 

Answer

stevevls picture stevevls · Apr 11, 2011

Once you have found the Element of your table, all you have to do is call getContent().toString(). Here's a quick example using your sample HTML:

Source source = new Source("<TABLE  cellspacing=0>\n" +
    "  <tr><td>HELLO</td>  \n" +
    "  <td>How are you</td></tr>\n" +
    "</TABLE>");

Element table = source.getFirstElement();
String tableContent = table.getContent().toString();

System.out.println(tableContent);

Output:

    <tr><td>HELLO</td>  
    <td>How are you</td></tr>