XMLStarlet does not select anything

uk4sx picture uk4sx · Jan 26, 2012 · Viewed 9.8k times · Source

I have a typical pom.xml, and want to print the groupId, artifactId and version, separated by colon. I think that XMLStarlet is the right tool for that. I tried several ways, but I always get an empty line.

xml sel -t -m project -v groupId -o : -v artifactId -o : -v version pom.xml

Expected output:

org.something.apps:app-acct:5.4

Real output: empty line

Even if I try to print just the groupId I get nothing:

xml sel -t -v project/groupId pom.xml

I am sure that the tool sees the elements because I can list them without problem:

xml el pom.xml

prints the following (correctly):

project
project/modelVersion
project/parent
project/parent/groupId
project/parent/artifactId
project/parent/version
project/groupId
project/artifactId
project/version
project/packaging

What's wrong?

Here is the cut-down version of pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                        http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.something</groupId>
        <artifactId>base</artifactId>
        <version>1.16</version>
    </parent>

    <groupId>org.something.apps</groupId>
    <artifactId>app-acct</artifactId>
    <version>5.4</version>
    <packaging>war</packaging>

</project>

Answer

uk4sx picture uk4sx · Jan 27, 2012

Unfortunately, XMLStarlet is very picky about the default namespace. If the document has it declared (xmlns=), you have to declare it for XMLStarlet too, and prefix the elements with the name you have chosen (see here):

xml sel -N my=http://maven.apache.org/POM/4.0.0 -t -m my:project -v my:groupId -o : -v my:artifactId -o : -v my:version pom.xml

Running the above command gives the expected output:

org.something.apps:app-acct:5.4

However, if the document does NOT have the default namespace declared (or the namespace has a slightly different URL), the above command will NOT work, which is a real PITA. A more universal solution is to remove the default namespace declaration before selecting the elements. As of XMLStarlet 1.3.1, converting the XML to PYX format and back removes the namespace declarations:

xml pyx pom.xml | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version 2>nul

UPDATE (2014-02-12): as of XMLStarlet 1.4.2 the PYX <-> XML conversion is fixed (does not remove namespace declarations), so the above command will NOT work (thanks for Peter Gluck for the tip). Use the following command instead:

xml pyx pom.xml | grep -v ^A | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version

Note: the grep above removes ALL attributes from the document, not just namespace declarations. For this specific case (selecting element values from pom.xml where elements with non-default namespaces are not expected) it is Ok, but for a general XML you would remove just the default namespace declaration(s) and nothing else:

xml pyx pom.xml | grep -v "^Axmlns " | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version


Note (obsolete): the error redirection (2>nul) is necessary to hide the complaint about the (now) unknown namespace xsi:

-:1.28: Namespace prefix xsi for schemaLocation on project is not defined

Another way of getting rid of the complaint is to remove the schemaLocation attribute (actually, this command removes all attributes from the PYX document, not just xsi:schemaLocation):

xml pyx pom.xml | grep -v ^A | xml p2x | xml sel -t -m project -v groupId -o : -v artifactId -o : -v version