How to save HTML to database and retrieve it properly

1110 picture 1110 · Feb 9, 2014 · Viewed 30.1k times · Source

Learning security these days :)
I need to allow users to enter text in a form and allow them some HTML tags: bold, italic, list etc. and to prevent them to add some dangerous JavaScript code.
So I have used this whitelist implementation to sanitize HTML.
But I am still confused about how to save and display it in the right way.
So here what I did:
Model:

public class Post
    {
        [AllowHtml]
        public string Data { get; set; }
    }

Controller:

[HttpPost, ActionName("Create")]
        [ValidateAntiForgeryToken]
        public ActionResult Create(Post model)
        {
            // Decode model.Data as it is Encoded after post
            string decodedString = HttpUtility.HtmlDecode(model.Data);
            // Clean HTML
            string sanitizedHtmlText =  HtmlUtility.SanitizeHtml(decodedString);

            string encoded = HttpUtility.HtmlEncode(sanitizedHtmlText);

View:

@using (Html.BeginForm("Create", "Home", FormMethod.Post)) {    
    @Html.AntiForgeryToken()
    @Html.TextAreaFor(a=>a.Data)
    <input type="submit" value="submit" />
}

So when I post a form I see:

<p>Simple <em><strong>whitelist</strong> </em>test:</p>
<ul>
<li>t1</li>
<li>t2</li>
</ul>
<p>Image:</p>
<p>&lt;img src="http://metro-portal.hr/img/repository/2010/06/medium/hijena_shutter.jpg" /&gt;</p>

Becaouse of <p>&lt; I think that I need to decode it first:

<p>Simple <em><strong>whitelist</strong> </em>test:</p>
<ul>
<li>t1</li>
<li>t2</li>
</ul>
<p>Image:</p>
<p><img src="http://metro-portal.hr/img/repository/2010/06/medium/hijena_shutter.jpg" /></p>

Then I sanitize it against whitelist and I get sanitized HTML:

<p>Simple <em><strong>whitelist</strong> </em>test:</p>
<ul>
<li>t1</li>
<li>t2</li>
</ul>
<p>Image:</p>
<p>

1) Should I save it like this in database?
2) Or I need to Encode this result and then save it to database (encoded bellow)?

&lt;p&gt;Simple &lt;em&gt;&lt;strong&gt;whitelist&lt;/strong&gt; &lt;/em&gt;test:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;t1&lt;/li&gt;
&lt;li&gt;t2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Image:&lt;/p&gt;
&lt;p&gt;

Here I am confused if I put it on the view like this:

@Model.Data

I get this on the view:

&lt;p&gt;Simple &lt;em&gt;&lt;strong&gt;whitelist&lt;/strong&gt; &lt;/em&gt;test:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;t1&lt;/li&gt; &lt;li&gt;t2&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Image:&lt;/p&gt; &lt;p&gt;

or

<p>Simple <em><strong>whitelist</strong> </em>test:</p> <ul> <li>t1</li> <li>t2</li> </ul> <p>Image:</p> <p>

So what to do to display this HTML properly (bold, list etc.)?

Answer

Darin Dimitrov picture Darin Dimitrov · Feb 9, 2014

The rule of thumb is the following:

  1. Store in your database the RAW HTML without any encodings or sanitizings. A SQL server doesn't care if you store some string containing XSS code.
  2. When displaying this output to your page make sure that it is sanitized.

So:

[HttpPost, ActionName("Create")]
[ValidateAntiForgeryToken]
public ActionResult Create(Post model)
{
    // store model.Data directly in your database without any cleaning or sanitizing
}

and then when displaying:

@Html.Raw(HtmlUtility.SanitizeHtml(Model.Data))

Notice how I used the Html.Raw helper here to ensure that you don't get double HTML encoded output. The HtmlUtility.SanitizeHtml function should already take care of sanitizing the value and return a safe string that you could display in your view and it will not be further encoded. If on the other hand you used @HtmlUtility.SanitizeHtml(Model.Data), then the @ razor function would HTML encode the result of the SanitizeHtml function which might not be what you are looking for.