Cleanup HTML with PHP to create clean string

Rein picture Rein · May 4, 2012 · Viewed 15.7k times · Source

I've got a bunch of HTML data that I'm writing to a PDF file using PHP. In the PDF, I want all of the HTML to be stripped and cleaned up. So for instance:

<ul>
    <li>First list item</li>
    <li>Second list item which is quite a bit longer</li>
    <li>List item with apostrophe 's 's</li>
</ul>

Should become:

First list item
Second list item which is quite a bit longer
List item with apostrophe 's 's

However, if I simply use strip_tags(), I get something like this:

   First list item&#8232;

   Second list item which is quite a bit
longer&#8232;

   List item with apostrophe &rsquo;s &rsquo;s

Also note the indentation of the output.

Any tips on how to properly cleanup the HTML to nice, clean strings without messy whitespace and odd characters?

Thanks :)

Answer

xCander picture xCander · May 4, 2012

The characters seems to be html entities. Try:

html_entity_decode( strip_tags( $my_html_code ) );