How can I decode HTML entities?

Frank picture Frank · Feb 23, 2009 · Viewed 36.6k times · Source

Here's a quick Perl question:

How can I convert HTML special characters like ü or ' to normal ASCII text?

I started with something like this:

s/\&#(\d+);/chr($1)/eg;

and could write it for all HTML characters, but some function like this probably already exists?

Note that I don't need a full HTML->Text converter. I already parse the HTML with the HTML::Parser. I just need to convert the text with the special chars I'm getting.

Answer

Telemachus picture Telemachus · Feb 23, 2009

Take a look at HTML::Entities:

use HTML::Entities;

my $html = "Snoopy & Charlie Brown";

print decode_entities($html), "\n";

You can guess the output.