How can I strip HTML in a string using Perl?

html regex perl strip

ParoX · Jul 1, 2009 · Viewed 30.7k times · Source

Is there anyway easier than this to strip HTML from a string using Perl?

$Error_Msg =~ s|<b>||ig;
$Error_Msg =~ s|</b>||ig;
$Error_Msg =~ s|<h1>||ig;
$Error_Msg =~ s|</h1>||ig;
$Error_Msg =~ s|<br>||ig;

I would appreicate both a slimmed down regular expression, e.g. something like this:

$Error_Msg =~ s|</?[b|h1|br]>||ig;

Is there an existing Perl function that strips any/all HTML from a string, even though I only need bolds, h1 headers and br stripped?

Answer

Assuming the code is valid HTML (no stray < or > operators)

$htmlCode =~ s|<.+?>||g;

If you need to remove only bolds, h1's and br's

$htmlCode =~ s#</?(?:b|h1|br)\b.*?>##g

And you might want to consider the HTML::Strip module