I have a simple forums application, when someone posts any content, i do:
post.Content = Sanitizer.GetSafeHtml(post.Content);
Now, i am not sure if i am doing something wrong, or what is going on, but it does not allow almost no html. Even simple <b></b>
is too much for it. So i guess that tool is totally useless.
Now my question: Can anyone tell me how should i sanitize my users inputs so that they can post some images(<img>
tags) and use bold emphasis etc?
It seems that many people find the sanitizer rather useless. Instead of using the sanitizer, just encode everything, and decode safe parts back:
private static readonly IEnumerable<string> WhitelistedTags =
new[] { "<b>", "</b>", "<i>", "</i>" };
private static readonly (string Encoded, string Decoded)[] DecodingPairs =
WhitelistedTags
.Select(tag => (Microsoft.Security.Application.Encoder.HtmlEncode(tag), tag))
.ToArray();
public static string Sanitize(string html)
{
// Encode the whole thing
var safeHtml = Microsoft.Security.Application.Encoder.HtmlEncode(html);
var builder = new StringBuilder(safeHtml);
// Decode the safe parts
foreach (var (encodedTag, decodedTag) in DecodingPairs)
{
builder.Replace(encodedTag, decodedTag);
}
return builder.ToString();
}
Please note that it's nearly impossible to safely decode an IMG tag, since there are really simple ways for an attacker to abuse this tag. Examples:
<IMG SRC="javascript:alert('XSS');">
<IMG SRC=javascript:alert('XSS')>
Take a look here for more a thorough XSS Cheat Sheet