HTML validation error: Non-space characters found before DOCTYPE

Smarty picture Smarty · Nov 8, 2011 · Viewed 16.8k times · Source

I have a blog(wordpress based). And try to validate by w3c validator one of my page. The first error is:

Line 1, Column 1: Non-space characters found without seeing a doctype first. Expected <!DOCTYPE html>.
<!DOCTYPE html><!-- HTML 5 -->

Also, DebugBar (http://www.my-debugbar.com/wiki/IETester/HomePage) agree and show two invisible chars before <! when I open the same page from "HTML Check" tab inside this tool. BUT!!

  1. This line of HTML-code come from file header.php in my wordpress theme.
  2. I download this file from my hoster to my local HDD.
  3. The first line of header.php is <!DOCTYPE html><!-- HTML 5 -->
  4. When I open header.php in RJ TextEd (just advanced text editor) it say: current encoding for header.php is UFT-8 without(!) BOM.
  5. When I open header.php in HEX-viewer I see, that byte 0 and 1 is 3c,21 - so it is exactly <!.

So, all things considered, why & where I get these "odd symbols" from?

Answer

Smarty picture Smarty · Nov 9, 2011

I found the root of problem. The general rule is:

If any(absolutely any!) file that take part in construction of the code of final HTML-page(the one to be sended to client) has encoding with BOM - final HTML-page WILL BE UTF-8-BOM. That is: you whole site should NOT contain even 1 file with BOM.

In my case I have total 1.3K files that make up my site. Only 4 files was BOMed:

  • wp-config.php (in root of site)
  • jquery.query.js (in include folder)
  • cyr-to-lat.php (in plug-in folder)
  • footer.php (in theme root folder)

And I was forced to re-save every and all of these 4 files as "UFT-8 without BOM" to get rid of "Non-space characters" validation error. When I did this (re-save files) - error is gone.