UTF-8 encoded html pages show � (questions marks) instead of characters

leugim picture leugim · Mar 26, 2011 · Viewed 120.1k times · Source

I have the standard XAMPP installation on win7 (x64). Having had my share of encoding troubles in a past project where mysql encoding did not match with the php enconding which in turn sometimes output html in other encodings, I decided to consistently encode everything using utf-8.

I'm just getting started with the html markup and am allready experiencing troubles.

  • My page is saved using utf-8 (no BOM, I think)
    //update: It turns out this was NOT the case. The file was actually saved with ISO_8859-1. I later found this out thanks to Sherm Pendleys answer. I had to go back and change my project settings (which were set to "ISO-8859-1") to the desired "UTF-8".
  • php is set per .htaccess to serve .php-pages in utf-8 with: AddCharset UTF-8 .php
  • html has a meta tag specifying: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • To test I set used php header('Content-Type:text/html; charset=UTF-8');

The page is evidently served in utf-8 (firefox and chrome recognize it as such) but any special characters such as é, á or ¡ will just show as . Also when viewing the source code.

When dropping the encoding settings mentioned above all characters are rendered correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

How come? I'm very puzzled. I would have expected the exact opposite behavior.
Any advice is welcome, thanks!

edit: Hopefully this helps a bit more. This is the response header (as per firebug)

HTTP/1.1 200 OK
Date: Sat, 26 Mar 2011 20:49:44 GMT
Server: Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1
X-Powered-By: PHP/5.3.1
Content-Length: 91
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

Answer

Sherm Pendley picture Sherm Pendley · Mar 26, 2011

When [dropping] the encoding settings mentioned above all characters [are rendered] correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

Then that's what you're really sending. None of the encoding settings in your bullet list will actually modify your output in any way; all they do is tell the browser what encoding to assume when interpreting what you send. That's why you're getting those �s - you're telling the browser that what you're sending is UTF-8, but it's really ISO-8859-1.