I had a discussion with a colleague of mine about the XML declaration node (I'm talking about this => <?xml version="1.0" encoding="UTF-8"?>
).
I believe that for something to be called "valid XML", it requires a XML declaration node.
My colleague states that the XML declaration node is optionnal, since the default encoding is UTF-8 and the version is always 1.0
. This make sense, but what does the standard says ?
In short, given the following file:
<books>
<book id="1"><title>Title</title></book>
</book>
Can we say that:
Thank you very much.
This:
<?xml version="1.0" encoding="UTF-8"?>
is not a processing instruction - it is the XML declaration. Its purpose is to configure the XML parser correctly before it starts reading the rest of the document.
It looks like a processing instruction, but unlike a real processing instruction it will not be part of the DOM the parser creates.
It is not necessary for "valid" XML. "Valid" means "represents a well-defined document type, as described in a DTD or a schema". Without a schema or DTD the word "valid" has no meaning.
Many people mis-use "valid" when they really mean "well-formed". A well-formed XML document is one that obeys the basic syntax rules of XML.
There is no XML declaration necessary for a document to be well-formed, either, since there are defaults for both version
and encoding
(1.0
and UTF-8
/UTF-16
, respectively). If a Unicode BOM (Byte Order Mark) is present in the file, it determines the encoding. If there is no BOM and no XML declaration, UTF-8 is assumed.
Here is a canonical thread on how encoding declaration and detection works in XML files. How default is the default encoding (UTF-8) in the XML Declaration?
To your questions:
You are confusing a few XML concepts here (not to worry, this confusion is common and stems partly from the fact that the concepts overlap and names are mis-used rather often).