What is the correct way to represent null XML elements?

Rob Hruska picture Rob Hruska · Apr 21, 2009 · Viewed 198k times · Source

I have seen null elements represented in several ways:

The element is present with xsi:nil="true":

 <book>
     <title>Beowulf</title>
     <author xsi:nil="true"/>
 </book>

The element is present, but represented as an empty element (which I believe is wrong since 'empty' and null are semantically different):

 <book>
     <title>Beowulf</title>
     <author/>
 </book>

 <!-- or: -->
 <book>
     <title>Beowulf</title>
     <author></author>
 </book>

The element is not present at all in the returned markup:

 <book>
     <title>Beowulf</title>
 </book>

The element has a <null/> child element (from TStamper below):

 <book>
     <title>Beowulf</title>
     <author><null/></author>
 </book>

Is there a correct, or canonical way to represent such a null value? Are there additional ways than the above examples?

The XML for the examples above is contrived, so don't read too far into it. :)

Answer

KitsuneYMG picture KitsuneYMG · Apr 21, 2009

xsi:nil is the correct way to represent a value such that: When the DOM Level 2 call getElementValue() is issued, the NULL value is returned. xsi:nil is also used to indicate a valid element with no content even if that elements content type normally doesn't allow empty elements.

If an empty tag is used, getElementValue() returns the empty string ("") If the tag is omitted, then no author tag is even present. This may be semantically different than setting it to 'nil' (Ex. Setting "Series" to nil may be that the book belongs to no series, while omitting series could mean that series is an inapplicable element to the current element.)

From: The W3C

XML Schema: Structures introduces a mechanism for signaling that an element should be accepted as ·valid· when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be ·valid· without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type.

A clarification:
If you have a book xml element and one of the child elements is book:series you have several options when filling it out:

  1. Removing the element entirely - This can be done when you wish to indicate that series does not apply to this book or that book is not part of a series. In this case xsl transforms (or other event based processors) that have a template that matches book:series will never be called. For example, if your xsl turns the book element into table row (xhtml:tr) you may get the incorrect number of table cells (xhtml:td) using this method.
  2. Leaving the element empty - This could indicate that the series is "", or is unknown, or that the book is not part of a series. Any xsl transform (or other evernt based parser) that matches book:series will be called. The value of current() will be "". You will get the same number of xhtml:td tags using this method as with the next described one.
  3. Using xsi:nil="true" - This signifies that the book:series element is NULL, not just empty. Your xsl transform (or other event based parser) that have a template matching book:series will be called. The value of current() will be empty (not empty string). The main difference between this method and (2) is that the schema type of book:series does not need to allow the empty string ("") as a valid value. This makes no real sense for a series element, but for a language element that is defined as an enumerated type in the schema, xsi:nil="true" allows the element to have no data. Another example would be elements of type decimal. If you want them to be empty you can union an enumerated string that only allows "" and a decimal, or use a decimal that is nillable.