Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:
"hello" in Russian: "ЗдраÑтвуйте"
In Chrome 33+, Safari 6, IE11, etc.
<!DOCTYPE html>
<html>
<head></head>
<body>
<p>"hello" in Russian is "здраствуйте"</p>
</body>
</html>
What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.
The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:
1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.
2) Add the following tag into the head
part:
<meta charset=utf-8>
There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.