Easiest way to write and read an XML

madu picture madu · Nov 23, 2011 · Viewed 27.6k times · Source

I'd like to know what is the easiest way to write to and parse a XML file in Android.

My requirement is very simple. A sample file would be something like:

<Item ID="1" price="$100" Qty="20" />

And I only want to retrieve an item by the ID and read price and Qty.

I was referring to Using XmlResourceParser to Parse Custom Compiled XML, but wondering if there is a much lightweight way to do something trivial as this (still using tags).

Answer

TextGeek picture TextGeek · Aug 28, 2017

If it's really that simple, you can just write it with printf() or similar.

For parsing, you're best off using a real XML parser (perhaps the SimpleXML that @netpork suggested). But for something truly this trivial, you could just use regexes -- here's my usual set, from which you'd need mainly 'attrlist' and 'stag' (for attribute list and start-tag).

xname      = "([_\\w][-_:.\\w\\d]*)";         # XML NAME (imperfect charset)
xnmtoken   = "([-_:.\\w\\d]+)";               #
xncname    = "([_\\w][-_.\\w\\d]*)";          #
qlit       = '("[^"]*"|\'[^\']*\')';          # Includes the quotes
attr       = "$xname\\s*=\\s*$qlit";          # Captures name and value
attrlist   = "(\\s+$attr)*";                  #
startTag   = "<$xname$attrlist\\s*/?>";       #
endTag     = "</$xname\\s*>";                 #
comment    = "(<!--[^-]*(-[^-]+)*-->)";       # Includes delims
pi         = "(<\?$xname.*?\?>)";             # Processing instruction
dcl        = "(<!$xname\\s+[^>]+>)";          # Markup dcl (imperfect)
cdataStart = "(<!\[CDATA\[)";                 # Marked section open
cdataEnd   = "(]]>)";                         # Marked section close
charRef    = "&(#\\d+|#[xX][0-9a-fA-F]+);";   # Num char ref (no delims)
entRef     = "&$xname;";                      # Named entity ref
pentRef    = "%$xname;";                      # Parameter entity ref
xtext      = "[^<&]*";                        # Neglects ']]>'
xdocument  = "^($startTag|$endTag|$pi|$comment|$entRef|$xtext)+\$";

A draft of the XML spec even included a "trivial" grammar for XML, that can find node boundaries correctly, but not catch all errors, expanding entity references, etc. See https://www.w3.org/TR/WD-xml-lang-970630#secF.

The main drawback is that if you run into fancier data later, it may break. For example, someone might send you data with a comment in there, or a syntax error, or an unquoted attribute, or using &quo, or whatever.