It feels like html_safe
adds an abstraction to the String class that requires understanding of what is going on, for example,
<%= '1 <b>2</b>' %> # gives 1 <b>2</b> in the HTML source code
<%= h '1 <b>2</b>' %> # exactly the same as above
<%= '1 <b>2</b>'.html_safe %> # 1 <b>2</b> in HTML source code
<%= h '1 <b>2</b>'.html_safe %> # exactly the same as above
<%= h (h '1 <b>2</b>') %> # 1 <b>2</b> wont' escape twice
For line 4, if we are saying, ok, we trust the string -- it is safe, but why can't we escape it? It seems that to escape it by h
, the string has to be unsafe.
So on line 1, if the string is not escaped by h
, it will be automatically escaped. On line 5, h
cannot escape the string twice -- in other words, after <
is changed to <
, it can't escape it one more time to &lt;
.
So what's happening? At first, I thought html_safe
is just tagging a flag to the string, saying it is safe. So then, why does h
not escape it? It seems that h
and html_escape
actually co-operate on using the flag:
1) If a string is html_safe, then h
will not escape it
2) If a string is not html_safe, then when the string is added to the output buffer, it will be automatically escaped by h
.
3) If h
already escaped a string, it is marked html_safe
, and therefore, escaping it one more time by h
won't take any effect. (as on Line 5, and that behavior is the same even in Rails 2.3.10, but on Rails 2.3.5 h
can actually escape it twice... so in Rails 2.3.5, h
is a simple escape method, but some where along the line to 2.3.10, h
became not as simple. But 2.3.10 won't auto escape a string, but for some reason, the method html_safe
already exists for 2.3.10 (for what purpose?))
Is that how it works exactly? I think nowadays, sometimes we don't get what we want in the output and we immediately add html_safe
to our variable, which can be quite dangerous, because it can introduce XSS attack that way, so understanding how it exactly works can be quite important. The above is only a guess of how it exactly work. Could it be actually a different mechanism and is there any doc that supports it?
As you can see, calling html_safe on a string turns it into an html safe SafeBuffer
Any operations on a SafeBuffer that could affect the string safety will be passed through h()
h uses this flag to avoid double escaping
The behavior did change and I think you are mostly correct about how it works. In general you should not call html_safe unless you're sure that it is already sanitized. Like anything, you have to be careful while using it