charset=iso-8859-1 with <!DOCTYPE HTML> throwing a warning?

ajax333221 picture ajax333221 · Jan 3, 2012 · Viewed 94.5k times · Source

I just validated a html doc using the W3-validator , and found that If I use:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

with:

<!DOCTYPE HTML>
  • It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.

However, it is fixed if I use:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

I don't really understand what is happening, also I don't even know how to use the DOCTYPE tag, I just copied and pasted one from around the web.

Can someone point me in the right direction to understand:

  • why this happens
  • and, how to use the DOCTYPE tag

Answer

Alohci picture Alohci · Jan 3, 2012

Changing the DOCTYPE is simply turning off the warning - it isn't actually fixing anything.

iso-8859-1 and windows-1252 are very similar encodings. They differ only in the characters associated with the 32 byte values from 0x80 to 0x9F, which in iso-8859-1 are mapped to control characters and in windows-1252 are mapped to some useful characters such as the Euro symbol.

The control characters are useless in HTML, and web authors often mistakenly declare iso-8859-1 and yet use one or more of those 32 values as if they were using windows-1252, so browsers when they see the iso-8859-1 charset being declared will automatically change this to be windows-1252.

The validator is simply warning you that this will happen. If you're not using any of the 32 byte values, then you can simply ignore the warning - it's NOT an error. If you are, and you genuinely want the iso-8859-1 interpretation of the byte values and not the windows-1252 interpretation, you are doing something wrong.

Again, this switching happens in browsers for any DOCTYPE, it's just that the HTML5 validator is being more helpful about what it is telling you than the HTML4 validator is.