How to properly sanitize content with AntiXss Library?

ojek picture ojek · Sep 23, 2012 · Viewed 12.9k times · Source

I have a simple forums application, when someone posts any content, i do:

post.Content = Sanitizer.GetSafeHtml(post.Content);

Now, i am not sure if i am doing something wrong, or what is going on, but it does not allow almost no html. Even simple <b></b> is too much for it. So i guess that tool is totally useless.

Now my question: Can anyone tell me how should i sanitize my users inputs so that they can post some images(<img> tags) and use bold emphasis etc?

Answer

Steven picture Steven · Sep 23, 2012

It seems that many people find the sanitizer rather useless. Instead of using the sanitizer, just encode everything, and decode safe parts back:

private static readonly IEnumerable<string> WhitelistedTags =
    new[] { "<b>", "</b>", "<i>", "</i>" };

private static readonly (string Encoded, string Decoded)[] DecodingPairs =
    WhitelistedTags
    .Select(tag => (Microsoft.Security.Application.Encoder.HtmlEncode(tag), tag))
    .ToArray();

public static string Sanitize(string html)
{
    // Encode the whole thing
    var safeHtml = Microsoft.Security.Application.Encoder.HtmlEncode(html);
    var builder = new StringBuilder(safeHtml);

    // Decode the safe parts
    foreach (var (encodedTag, decodedTag) in DecodingPairs)
    {
        builder.Replace(encodedTag, decodedTag);
    }

    return builder.ToString();
}

Please note that it's nearly impossible to safely decode an IMG tag, since there are really simple ways for an attacker to abuse this tag. Examples:

<IMG SRC="javascript:alert('XSS');">

<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

Take a look here for more a thorough XSS Cheat Sheet