How can I use POCO to parse an xml file and extract a particular node to a std::string?

Totte Karlsson picture Totte Karlsson · Mar 25, 2013 · Viewed 13k times · Source

I want to extract an individual node using POCO's libraries but can't figure out how to do it. I'm new to XML.

The XML itself looks something like this (abbreviated):

<?xml version="1.0" encoding="UTF-8"?>
<!-- Created by XMLPrettyPrinter on 11/28/2012 from  -->
<sbml xmlns = "http://www.sbml.org/sbml/level2/version4" level = "2" version = "4">
<model id = "cell">
  <listOfSpecies>
</listOfSpecies>
  <listOfParameters>
     <parameter id = "kk1" value = "1"/>
  </listOfParameters>
  <listOfReactions>
     <reaction id = "J1" reversible = "false">
... much stuff here ..
  </listOfReactions>
</model>
</sbml>

I want to extract everything in the listOfReactions node and store it in a std::string, for later MD5 hashing.

I have tried this:

ifstream in(JoinPath(gTestDataFolder, "Test_1.xml").c_str());
InputSource src(in);
DOMParser parser;
AutoPtr<Document> pDoc = parser.parse(&src);
NodeIterator it(pDoc, Poco::XML::NodeFilter::SHOW_ALL);
Node* pNode = it.nextNode();

while(pNode)
{
    clog<<pNode->nodeName()<<endl;
    string elementID = "listOfReactions";
    if(pNode->nodeName() == "listOfReactions")
    {
         //Extract everything in this node... how???
    }

    pNode = it.nextNode();
}

Answer

Poul picture Poul · Jun 25, 2013

I ran into a similar problem myself. For instance in your case with the "Poco::XML::NodeFilter::SHOW_ALL" filter applied, all node types(Element, Text, CDataSection, etc) will be included when iteratering through the XML document. I found that POCO does not include all the data in each node it returns from "NextNode()".

If one wants to access an XML nodes attributes, one first has to query the node to check whether it has any attributes using "hasAttributes()" and then if it does, iterate through each of these attributes to find the ones of interest.

XML Example:

<?xml version="1.0"?>
<reaction id="J1" reversible="false">

C++ Example:

...
Poco::XML::NamedNodeMap* attributes = NULL;
Poco::XML::Node* attribute = NULL;

while(pNode)
{
 if( (pNode->nodeName() == "reaction") && pNode->hasAttributes())
 {
   attributes = pNode->attributes(); 
   for(unsigned int i = 0; i < attributes->length(); i++)
   {
     attribute = attributes->item(i);
     cout << attribute->nodeName() << " : " << attribute->nodeValue() << endl
   }
  }
  pNode = it.nextNode();
}
...

Should output:

id : J1
reversible : false

If one wants to access the text between two XML tags, as shown in the XML example below, one first has to find the node with a name that matches the tag of interest, as you have done in your example, and then check the next node by calling "NextNode()" to see if this node has the node name "#text" or "#cdata-section". If this is the case, the value of this "next node" will contain the text between the XML tags.

XML Example:

<?xml version="1.0"?>
<listOfReactions>Some text</listOfReactions>

C++ Example:

...
while(pNode)
{
 if(pNode->nodeName() == "listOfReactions")
 {
   pNode = it.nextNode();
   if(pNode->nodeName() != "#text")
   {
     continue; //No text node present
   }
   cout << "Tag Text: " << pNode->nodeValue() << endl;
  }
  pNode = it.nextNode();
}
...

Should output:

Some text