PugiXML C++ getting content of an element (or a tag)

Grego picture Grego · Mar 21, 2012 · Viewed 11k times · Source

Well I'm using PugiXML in C++ using Visual Studio 2010 to get the content of an element, but the thing is that it stops to getting the value when it sees a "<" so it doesn't get the value, it just gets the content till it reaches a "<" character even if the "<" is not closing its element. I want it to get till it reaches its closing tag even if it ignores the tags, but only the text inside of the inner tags, at least.

And I also would like to know how to get the Outer XML for example if I fetch the element

pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b"); what do I do to get the whole content which would be " Link Till here"

this content is the same given down here:

#include "pugixml.hpp"

#include <iostream>
#include <conio.h>
#include <stdio.h>

using namespace std;

int main//21
    () {
    string source = "<mesh name='sphere'><bounds><b id='hey'> <a DeriveCaptionFrom='lastparam' name='testx' href='http://www.google.com'>Link Till here<b>it will stop here and ignore the rest</b> text</a></b> 0 1 1</bounds></mesh>";

    int from_string;
    from_string = 1;

    pugi::xml_document doc;
    pugi::xml_parse_result result;
    string filename = "xgconsole.xml";
    result = doc.load_buffer(source.c_str(), source.size());
    /* result = doc.load_file(filename.c_str());
    if(!result){
        cout << "File " << filename.c_str() << " couldn't be found" << endl;
        _getch();
        return 0;
    } */

        pugi::xpath_node_set tools = doc.select_nodes("/mesh/bounds/b/a[@href='http://www.google.com' and @DeriveCaptionFrom='lastparam']");

        for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it) {
            pugi::xpath_node node = *it;
            std::cout << "Attribute Href: " << node.node().attribute("href").value() << endl;
            std::cout << "Value: " << node.node().child_value() << endl;
            std::cout << "Name: " << node.node().name() << endl;

        }

    _getch();
    return 0;
}

here is the output:

Attribute Href: http://www.google.com
Value: Link Till here
Name: a

I hope I was clear enough, Thanks in advance

Answer

zeuxcg picture zeuxcg · Mar 22, 2012

My psychic powers tell me you want to know how to get the concatenated text of all children of the node (aka inner text).

The easiest way to do that is to use XPath like that:

pugi::xml_node node = doc.child("mesh").child("bounds").child("b");
string text = pugi::xpath_query(".").evaluate_string();

Obviously you can write your own recursive function that concatenates the PCDATA/CDATA values from the subtree; using a built-in recursive traversing facility, such as find_node, would also work (using C++11 lambda syntax):

string text;
text.find_node([&](pugi::xml_node n) -> bool { if (n.type() == pugi::node_pcdata) result += n.value(); return false; });

Now, if you want to get the entire contents of the tag (aka outer xml), you can output a node to string stream, i.e.:

ostringstream oss;
node.print(oss);
string xml = oss.str();

Getting inner xml will require iterating through node's children and appending their outer xml to the result, i.e.

ostringstream oss;
for (pugi::xml_node_iterator it = node.begin(); it != node.end(); ++it)
    it->print(oss);
string xml = oss.str();