I want to parse HTML text in Java.
I have tried to parse HTML data using javax.swing.text.html.HTMLEditorKit. It helped me to get data from HTML. But I have a HTML data like -
<span class="TitleServiceChange" >Service Change</span>
<span class="DateStyle">
&nbsp;Posted:&nbsp;12/16/2012&nbsp; 8:00PM
</span><br/><br/>
<P>
with surrounding '<' and '>' instead of '<' and '>'
While parsing the above text I am getting the error -
Parsing error: start.missing body ? ? at
Please suggest me to resolve my problem. Thanks in advance.
For unescaping the full set of escaped characters included at a string, you could make use of the Apache Commons Lang utility library.
Specifically, using the StringEscapeUtils class, where you can find the unescapeHtml4
method, among others.