Using C# regular expressions to remove HTML tags

Steve picture Steve · Apr 25, 2009 · Viewed 205.3k times · Source

How do I use C# regular expression to replace/remove all HTML tags, including the angle brackets? Can someone please help me with the code?

Answer

Daniel Brückner picture Daniel Brückner · Apr 25, 2009

As often stated before, you should not use regular expressions to process XML or HTML documents. They do not perform very well with HTML and XML documents, because there is no way to express nested structures in a general way.

You could use the following.

String result = Regex.Replace(htmlDocument, @"<[^>]*>", String.Empty);

This will work for most cases, but there will be cases (for example CDATA containing angle brackets) where this will not work as expected.