MS's AntiXSS (v4.2.1) Sanitizer.GetSafeHtmlFragment(string)
method is removing <br>
and <br />
tags from my input. Is this supposed to happen? Is there a way around it?
It seems to be removing \n
and \r
characters too, so I cannot call Replace()
after the sanitizer has done its job.
The 4.2.x release was motivated by a security vulnerability detected precisely in the HTML sanitizer. More information about this fact:
However, it seems that besides fixing the vulnerability the sanitizer was changed to be much more aggressive to the point of being almost unusable. There is a reported issue about this fact in WPL CodePlex site (GetSafeHtmlFragment replacing all html tags).
If your problem is only with <br>
tag and you want to stick with AntiXSS sanitizer then you can implement an ugly workaround resorting to pre-processing your input an then post-process the result of the sanitizer.
Something like this (code for illustrative purposes only):
static void Main(string[] args)
{
string input = "<br>Hello<br/>World!";
input = EscapeHtmlBr(input);
var result = Sanitizer.GetSafeHtmlFragment(input);
result = UnescapeHtmlBr(result);
Console.WriteLine(result);
}
const string BrMarker = @"|br|";
private static string UnescapeHtmlBr(string result)
{
result = result.Replace(BrMarker, "<br />");
return result;
}
private static string EscapeHtmlBr(string input)
{
input = input.Replace("<br>", BrMarker);
input = input.Replace("<br />", BrMarker);
input = input.Replace("<br/>", BrMarker);
return input;
}