Is Sanitizer.GetSafeHtmlFragment supposed to remove <br> elements?

Chaddeus picture Chaddeus · Jul 8, 2012 · Viewed 9k times · Source

MS's AntiXSS (v4.2.1) Sanitizer.GetSafeHtmlFragment(string) method is removing <br> and <br /> tags from my input. Is this supposed to happen? Is there a way around it?

It seems to be removing \n and \r characters too, so I cannot call Replace() after the sanitizer has done its job.

Answer

Jo&#227;o Angelo picture João Angelo · Jul 8, 2012

The 4.2.x release was motivated by a security vulnerability detected precisely in the HTML sanitizer. More information about this fact:

However, it seems that besides fixing the vulnerability the sanitizer was changed to be much more aggressive to the point of being almost unusable. There is a reported issue about this fact in WPL CodePlex site (GetSafeHtmlFragment replacing all html tags).

If your problem is only with <br> tag and you want to stick with AntiXSS sanitizer then you can implement an ugly workaround resorting to pre-processing your input an then post-process the result of the sanitizer.

Something like this (code for illustrative purposes only):

static void Main(string[] args)
{
    string input = "<br>Hello<br/>World!";

    input = EscapeHtmlBr(input);
    var result = Sanitizer.GetSafeHtmlFragment(input);
    result = UnescapeHtmlBr(result);

    Console.WriteLine(result);
}

const string BrMarker = @"|br|";

private static string UnescapeHtmlBr(string result)
{
    result = result.Replace(BrMarker, "<br />");

    return result;
}

private static string EscapeHtmlBr(string input)
{
    input = input.Replace("<br>", BrMarker);
    input = input.Replace("<br />", BrMarker);
    input = input.Replace("<br/>", BrMarker);

    return input;
}