HTML Agility Pack is an open-source HTML parser that builds a read/write DOM and supports Linq, plain XPATH or XSLT.
I have this ill-formed HTML with overlapping tags: <p>word1<b>word2</p> <…