I'm trying to convert HTML Codes like the &#XXXX; (where XXXX is a number) to plain text using classic ASP (VBScript).
I'm adding the text to an email which is in plain text format and if I add them as HTML Codes, it just displays the code and doesn't convert them.
One fix would be to change the email to be HTML which does fix that problem but then causes other problems for my email which I won't go into.
Is there a built in function or a custom function I can use to convert these HTML Codes to plain text?
What you need is HTML Decode, though unfortunately ASP doesn't include one.
This function, found on ASP Nut, and modified heavily by me, should do what you need. I tested it as vbscript running on my local computer and it seemed to work well, even with Unicode symbols in the 1000+ range.
Function HTMLDecode(sText)
Dim regEx
Dim matches
Dim match
sText = Replace(sText, """, Chr(34))
sText = Replace(sText, "<" , Chr(60))
sText = Replace(sText, ">" , Chr(62))
sText = Replace(sText, "&" , Chr(38))
sText = Replace(sText, " ", Chr(32))
Set regEx= New RegExp
With regEx
.Pattern = "&#(\d+);" 'Match html unicode escapes
.Global = True
End With
Set matches = regEx.Execute(sText)
'Iterate over matches
For Each match in matches
'For each unicode match, replace the whole match, with the ChrW of the digits.
sText = Replace(sText, match.Value, ChrW(match.SubMatches(0)))
Next
HTMLDecode = sText
End Function
Note: You'll need script version 5.0 installed on your server to use the RegExp object.