Should I use XPath or just DOM?

Pete  picture Pete · Mar 4, 2011 · Viewed 8.6k times · Source

I have a bunch of hierarchical data stored in an XML file. I am wrapping that up behind hand-crafted classes using TinyXML. Given an XML fragment that describes a source signature as a set of (frequency, level) pairs a bit like this:

<source>
  <sig><freq>1000</freq><level>100</level><sig>
  <sig><freq>1200</freq><level>110</level><sig>
</source>

i am extracting the pairs with this:

std::vector< std::pair<double, double> > signature() const
{
    std::vector< std::pair<double, double> > sig;
    for (const TiXmlElement* sig_el = node()->FirstChildElement ("sig");
        sig_el;
        sig_el = sig_el->NextSiblingElement("sig"))
    {
        const double level = boost::lexical_cast<double> (sig_el->FirstChildElement("level")->GetText());
        const double freq =  boost::lexical_cast<double> (sig_el->FirstChildElement("freq")->GetText());
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

where node() is pointing at the <source> node.

Question: would I get a neater, more elegant, more maintainable or in any other way better piece of code using an XPath library instead?

Update: I have tried it using TinyXPath two ways. Neither of them actually work, which is a big point against them obviously. Am I doing something fundamentally wrong? If this is what it is going to look like with XPath, I don't think it is getting me anything.

std::vector< std::pair<double, double> > signature2() const
{
    std::vector< std::pair<double, double> > sig;
    TinyXPath::xpath_processor source_proc (node(), "sig");
    const unsigned n_nodes = source_proc.u_compute_xpath_node_set();
    for (unsigned i = 0; i != n_nodes; ++i)
    {
        TiXmlNode* s = source_proc.XNp_get_xpath_node (i);
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

std::vector< std::pair<double, double> > signature3() const
{
    std::vector< std::pair<double, double> > sig;
    int i = 1;
    while (TiXmlNode* s = TinyXPath::xpath_processor (node(), 
        ("sig[" + boost::lexical_cast<std::string>(i++) + "]/*").c_str()).
        XNp_get_xpath_node(0))
    {
        const double level = TinyXPath::xpath_processor(s, "level/text()").d_compute_xpath();
        const double freq =  TinyXPath::xpath_processor(s, "freq/text()").d_compute_xpath();
        sig.push_back (std::make_pair (freq, level));
    }
    return sig;
}

As a secondary issue, if so, which XPath library should I be using?

Answer

Alain Pannetier picture Alain Pannetier · Mar 5, 2011

In general I tend to prefer XPath based solutions for their concision and versatility but, honestly, in your case, I don't think using XPath will bring a lot to your signature.

Here is why:

Code elegance
Your code is nice and compact and it will not get any better with an XPath expression.

Memory footprint
Unless your input XML configuration file is huge (a kind of oxymoron) and the DOM parsing would entail a large memory footprint, for which there is no proof that using XPath would be a decisive cure, I would stick with DOM.

Execution Speed
On such a simple XML tree, execution speed should be comparable. If there would be a difference, it would probably be in TinyXml's advantage because of the collocation of the freq and level tags under a given node.

Libraries and external references That's the decisive point.
The leading XPath engine in the C++ world is XQilla. It supports XQuery (therefore both XPath 1.0 and 2.0) and is backed by Oracle because it's developed by the group responsible for Berkeley DB products (including precisely Berkeley DB XML – which uses XQilla).
The problem for C++ developers wishing to use XQilla is that they have several alternatives

  1. use Xerces 2 and XQilla 2.1 litter your code with casts.
  2. use XQilla 2.2+ and use Xerces 3 (no casts needed here)
  3. use TinyXPath nicely integrated with TinyXml but for which there however are a number of limitations (no support for namespaces for instance)
  4. mix Xerces and tinyXml

In summary, in your case switching to XPath just for the sake of it, would bring little benefit if any.

Yet, XPath is a very powerful tool in today's developer toolbox and no one can ignore it. If you just wish to practice on a simple example, yours is as good as any. Then, I'd keep in mind the points above and probably use TinyXPath anyway.