Parsing a XML file with Perl XMLSimple

vobelic picture vobelic · Apr 23, 2013 · Viewed 9.7k times · Source

I'm trying to parse a XML-like file with the following structure:

Edit: I tried to omit most of the huge xml file to simplify everything but c/p-ed wrongly. Here's the full file (900kb!) that actually has this issue: https://docs.google.com/file/d/0B3ustNI1qZh1UURrYWZJQk0wVlU/edit?usp=sharing

<CIM CIMVERSION="2.0" DTDVERSION="2.0">

  <DECLARATION>
    <DECLGROUP>
      <LOCALNAMESPACEPATH>
        <NAMESPACE NAME="signalingsystem"/>
      </LOCALNAMESPACEPATH>

      <VALUE.OBJECT>
        <INSTANCE CLASSNAME="SharedGtTranslator">
          <PROPERTY NAME="Name" TYPE="string">
            <VALUE>AUC$4,1,6,4,26202*-->AUC RemoteSPC: 300 SSN: 10</VALUE>
          </PROPERTY>
          <PROPERTY NAME="NatureOfAddress" TYPE="sint32">
            <VALUE>4</VALUE>
          </PROPERTY>
        </INSTANCE>
      </VALUE.OBJECT>

      <VALUE.OBJECT>
        <INSTANCE CLASSNAME="SharedGtTranslator">
          <PROPERTY NAME="Name" TYPE="string">
            <VALUE>AUC$4,2,6,4,26202*-->AUC AUC LocalSPC: 410 SSN: 10</VALUE>
          </PROPERTY>
          <PROPERTY NAME="NatureOfAddress" TYPE="sint32">
            <VALUE>4</VALUE>
          </PROPERTY>
            <VALUE>2</VALUE>
          </PROPERTY>
        </INSTANCE>
      </VALUE.OBJECT>
    </DECLGROUP>

  </DECLARATION>
</CIM>

I'm using XMLSimple to parse that structure. I need to get all the Values for the PROPERTY NAME="Name" if CLASSNAME="SharedGtTranslator".

This is what I'm trying to do:

#!/usr/bin/perl
use strict;
use warnings;
# use module
use XML::Simple;
use Data::Dumper;

my $file1 = $ARGV[0];
# create object
my $xml = new XML::Simple;

# read XML file
my $data = $xml->XMLin($file1);
foreach my $object (@{$data->{DECLARATION}->{DECLGROUP}->{'VALUE.OBJECT'}}) {
        if ($object->{INSTANCE}->{CLASSNAME} eq 'SharedGtTranslator') {
                foreach my $property (@{$object->{INSTANCE}->{PROPERTY}}) {
                        if ($property->{NAME} eq 'Name') {
                                print $property->{VALUE} . "\n";
                        }
                }

        }
}

Getting

"Pseudo-hashes are deprecated"

and nothing happens.

Help is highly appreciated!

Answer

Borodin picture Borodin · Apr 23, 2013

Your code works fine for me as it stands. Is that the full program? There is no use of pseudo-hashes in that code.

The only problem I can see is that your XML data isn't well-formed. There is a spurious

  <VALUE>2</VALUE>
</PROPERTY>

at the end of the last INSTANCE element. Once this is fixed your program runs fine.

XML::Simple seems to be working for you, so it's probably appropriate to stick with it. But I don't generally recommend that people use this module. It can be far from simple to get working, and the structure it builds doesn't fully reflect the XML data, so something like XML::Twig or XML::LibXML is often much better.


Update

Working with your real data, the structure generated by XML::Simple looks quite unlike what is generated for the short example. There are arrays intermingled with the hashes that weren't there before.

This program seems to generate what you need. It produces 170 lines of output.

use strict;
use warnings;

use XML::Simple;

my $file1 = 'active_7v19.om.cim';

my $xml  = new XML::Simple;
my $data = $xml->XMLin($file1);

for my $declgroup (@{ $data->{DECLARATION}{DECLGROUP} }) {

    foreach my $object (@{ $declgroup->{'VALUE.OBJECT'} }) {

        my $instance   = $object->{INSTANCE};
        my $classname  = $instance->{CLASSNAME};
        my $properties = $instance->{PROPERTY};

        next unless $classname eq 'SharedGtTranslator';

        for my $property (@$properties) {

            my $name  = $property->{NAME};
            my $value = $property->{VALUE};

            print $value, "\n" if $name eq 'Name';
        }
    }
}

However, I am more sure now that you would be better off with a "real" XML library. THis code uses XML::LibXML to produce the same output.

use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml(location => $file1, no_blanks => 1);

my @properties = $doc->findnodes('//INSTANCE[@CLASSNAME = "SharedGtTranslator"]/PROPERTY[@NAME = "Name"]');

for my $property (@properties) {
    print $property->textContent('VALUE'), "\n";
}

All the work is done by the XPath expression, which selects all PROPERTY elements with a NAME attribute of Name that are children of an INSTANCE element anywhere in the document that has a CLASSNAME attribute of SharedGtTranslator. The subsequent for loop prints the value of the VALUE element within each PROPERTY. It is clearly a lot more concise, and it is also faster to run, and more flexible if you need to extract different information.