Using ASP.NET, how can I strip the HTML tags from a given string reliably (i.e. not using regex)? I am looking for something like PHP's strip_tags
.
<ul><li>Hello</li></ul>
"Hello"
I am trying not to reinvent the wheel, but I have not found anything that meets my needs so far.
If it is just stripping all HTML tags from a string, this works reliably with regex as well. Replace:
<[^>]*(>|$)
with the empty string, globally. Don't forget to normalize the string afterwards, replacing:
[\s\r\n]+
with a single space, and trimming the result. Optionally replace any HTML character entities back to the actual characters.
Note:
>
in attribute values. This solution will return broken markup when encountering such values.