I have a HTML file which contains a specific tag, e.g. <TABLE cellspacing=0>
and the end tag is </TABLE>
. Now I want to get everything between those tags. I am using Jericho HTML parser in Java to parse the HTML. Is it possible to get the text & other tags between specific tags in Jericho parser?
For example:
<TABLE cellspacing=0>
<tr><td>HELLO</td>
<td>How are you</td></tr>
</TABLE>
Answer:
<tr><td>HELLO</td>
<td>How are you</td></tr>
Once you have found the Element of your table, all you have to do is call getContent().toString(). Here's a quick example using your sample HTML:
Source source = new Source("<TABLE cellspacing=0>\n" +
" <tr><td>HELLO</td> \n" +
" <td>How are you</td></tr>\n" +
"</TABLE>");
Element table = source.getFirstElement();
String tableContent = table.getContent().toString();
System.out.println(tableContent);
Output:
<tr><td>HELLO</td>
<td>How are you</td></tr>