Classic ASP (VBScript) convert HTML codes to plain text

Dan Ellis picture Dan Ellis · May 24, 2011 · Viewed 24k times · Source

I'm trying to convert HTML Codes like the &#XXXX; (where XXXX is a number) to plain text using classic ASP (VBScript).

I'm adding the text to an email which is in plain text format and if I add them as HTML Codes, it just displays the code and doesn't convert them.

One fix would be to change the email to be HTML which does fix that problem but then causes other problems for my email which I won't go into.

Is there a built in function or a custom function I can use to convert these HTML Codes to plain text?

Answer

C. Ross picture C. Ross · May 24, 2011

What you need is HTML Decode, though unfortunately ASP doesn't include one.

This function, found on ASP Nut, and modified heavily by me, should do what you need. I tested it as vbscript running on my local computer and it seemed to work well, even with Unicode symbols in the 1000+ range.

Function HTMLDecode(sText)
    Dim regEx
    Dim matches
    Dim match
    sText = Replace(sText, """, Chr(34))
    sText = Replace(sText, "<"  , Chr(60))
    sText = Replace(sText, ">"  , Chr(62))
    sText = Replace(sText, "&" , Chr(38))
    sText = Replace(sText, " ", Chr(32))


    Set regEx= New RegExp

    With regEx
     .Pattern = "&#(\d+);" 'Match html unicode escapes
     .Global = True
    End With

    Set matches = regEx.Execute(sText)

    'Iterate over matches
    For Each match in matches
        'For each unicode match, replace the whole match, with the ChrW of the digits.

        sText = Replace(sText, match.Value, ChrW(match.SubMatches(0)))
    Next

    HTMLDecode = sText
End Function

Note: You'll need script version 5.0 installed on your server to use the RegExp object.