can xsd schema validate encoding, e.g. UTF-8, possible?

lee picture lee · Dec 10, 2010 · Viewed 7.7k times · Source

By using schema, is there any simple/easy way to validate the encoding of an xml msg?

Assuming the 1st line of xml is "not" trustworthy? e.g. ignore ?xml version="1.0" encoding="UTF-8" ?

Answer

James picture James · Dec 10, 2010

No, schema can't dictate encoding type except in terms of the binary data element types, but this encoding is still going to be encapsulated by the high level encoding of the document itself. This makes sense if you realize that the schema is suppose to describe the information and not the transport format. The XML specification is what dictates that document transport information such as how information is represented and stored in the most generalized sense. Schema is for validating that the data stored via XML meets some kind of syntactical restraints between parties. The prolog (that first line you mention) as defined in the XML spec is the entity in which existence (or not) is what a complaint XML reader needs to know how the document is encoded. Encoding is simply the agreement between the endpoints to represent the correct unicode code points and XML specification specifies how this agreement is reached, not schema.

If you are interested, this is the relevant section of the XML 1.1 specification on how this agreement is reached and more interesting how a complaint reader can 'guess' at the encoding so it is good enough of a guess that the prolog can be read to read the actual encoding attribute: http://www.w3.org/TR/xml11/#sec-guessing