Which characters are Invalid (unless encoded) in an XML attribute?

Euro Micelli picture Euro Micelli · May 15, 2009 · Viewed 39.9k times · Source

I can't believe I can't find this information easily accessible, so:

1) Which characters cannot be incorporated in an XML attribute without entity-encoding them?

Obviously, you need to encode quotes. What about < and >? What else?

2) Where exactly is the official list?

Answer

great_llama picture great_llama · May 15, 2009

Here is the definition of what is allowed in an attribute value.

'"' ([^<&"] | Reference)* '"'  |  "'" ([^<&'] | Reference)* "'" 

So, you can't have:

  • the same character that opens/closes the attribute value (either ' or ")
  • a naked ampersand (& must be &amp;)
  • a left angle bracket (< must be &lt;)

You should also not being using any characters that are outright not legal anywhere in an XML document (such as form feeds, etc).