Here's some code I am having problems with, I process some XML and in a method in an OO class I extract an element from each of several nodes that repeat in the document. There should only be one such element in the subtree for each node but my code gets all elements as if it is operating on the document as a whole.
Because I only expected to get oine element I only use the zeroth element of an array, this leads my function to output the wrong value (and its the same for all items in the document)
Here's some simplified code that illustrates the problem
$ cat t4.pl
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $xml = <<EndXML;
<Envelope>
<Body>
<Reply>
<List>
<Item>
<Id>8b9a</Id>
<Message>
<Response>
<Identifier>55D</Identifier>
</Response>
</Message>
</Item>
<Item>
<Id>5350</Id>
<Message>
<Response>
<Identifier>56D</Identifier>
</Response>
</Message>
</Item>
</List>
</Reply>
</Body>
</Envelope>
EndXML
my $foo = Foo->new();
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string( $xml );
my @list = $doc->getElementsByTagName( 'Item' );
for my $item ( @list ) {
my $id = get( $item, 'Id' );
my @messages = $item->getElementsByLocalName( 'Message' );
for my $message ( @messages ) {
my @children = $message->getChildNodes();
for my $child ( @children ) {
my $name = $child->nodeName;
if ( $name eq 'Response' ) {
print "child is a Response\n";
$foo->do( $child, $id );
}
elsif ( $name eq 'text' ) {
# ignore whitespace between elements
}
else {
print "child name is '$name'\n";
}
} # child
} # Message
} # Item
# ..............................................
sub get {
my ( $node, $name ) = @_;
my $value = "(Element $name not found)";
my @targets = $node->getElementsByTagName( $name );
if ( @targets ) {
my $target = $targets[0];
$value = $target->textContent;
}
return $value;
}
# ..............................................
package Foo;
sub new {
my $self = {};
bless $self;
return $self;
}
sub do {
my $self = shift;
my ( $node, $id ) = @_;
print '-' x 70, "\n", ' ' x 12, $node->toString( 1 ), "\n", '-' x 70, "\n";
my @identifiers = $node->findnodes( '//Identifier' );
print "do() found ", scalar @identifiers, " Identifiers\n";
print "$id, ", $identifiers[0]->textContent, "\n\n";
}
Here's the output
$ perl t4.pl
child is a Response
----------------------------------------------------------------------
<Response>
<Identifier>55D</Identifier>
</Response>
----------------------------------------------------------------------
do() found 2 Identifiers
8b9a, 55D
child is a Response
----------------------------------------------------------------------
<Response>
<Identifier>56D</Identifier>
</Response>
----------------------------------------------------------------------
do() found 2 Identifiers
5350, 55D
I was expecting
do() found 1 Identifiers
I was expecting the last line to be
5350, 56D
I am using an old version of XML::LibXML due to platform issues.
Q: Does the problem exist in later versions or am I doing something wrong?
From the documentation of XPath 1.0
//para selects all the para descendants of the document root
(emphasis my own). So your call
$node->findnodes( '//Identifier' )
is ignoring the context node $node
and searching for all Identifier
elements anywhere in the document
To get all Identifier
descendants of the context node you must add a dot, like this
$node->findnodes('.//Identifier');
but since $node
is always a Response
element and Identifier
is a direct child of Response
you can just write
$node->findnodes('Identifier');
You seem to have got yourself a little tied up writing this. I know you have cut the code down as an example, but do you really need the separate package? Much can be done with judicious application of XPath.
The most obvious change is that you don't need to loop through all children - you can simply pick out the ones you're interested in.
This refactored code may be worth reading
use strict;
use warnings;
use XML::LibXML;
my $parser = XML::LibXML->new;
my $doc = $parser->parse_fh(*DATA);
for my $item ( $doc->findnodes('//Item') ) {
print "\n";
my ($id) = $item->findvalue('Id');
printf "Item Id: %s\n", $item->findvalue('Id');
my @messages = $item->findnodes('Message');
for my $message (@messages) {
my ($response) = $message->findnodes('Response');
printf "Response Identifier: %s\n", $response->findvalue('Identifier');
}
}
__DATA__
<Envelope>
<Body>
<Reply>
<List>
<Item>
<Id>8b9a</Id>
<Message>
<Response>
<Identifier>55D</Identifier>
</Response>
</Message>
</Item>
<Item>
<Id>5350</Id>
<Message>
<Response>
<Identifier>56D</Identifier>
</Response>
</Message>
</Item>
</List>
</Reply>
</Body>
</Envelope>
output
Item Id: 8b9a
Response Identifier: 55D
Item Id: 5350
Response Identifier: 56D