How to remove namespaces from XML using XSLT

GaneshT picture GaneshT · Mar 11, 2011 · Viewed 59.4k times · Source

I have a 150 MB (it can go even more sometimes) XML file. I need to remove all the namespaces. It's on Visual Basic 6.0, so I'm using DOM to load the XML. Loading is okay, I was skeptical at first, but somehow that part works fine.

I am trying the following XSLT, but it removes all the other attributes also. I want to keep all the attributes and elements, I just need to remove the namespaces. Apparently it's because I have xsl:element but not attribute. How can I include the attributes there?

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" version="1.0" encoding="UTF-8" />
    <xsl:template match="*">
        <xsl:element name="{local-name()}">
            <xsl:apply-templates select="@* | node()"/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Answer

jasso picture jasso · May 3, 2011

Your XSLT removes attributes also, because you don't have a template that would copy them. <xsl:template match="*"> matches only elements, not attributes (or text, comments or processing instructions).

Below is a stylesheet that removes all namespace definitions from the processed document but copies all other nodes and values: elements, attributes, comments, text and processing instructions. Please pay attention to 2 things

  1. Copying the attributes as such is not enough to remove all namespaces. Also an attribute can belong to a namespace, even when the containing element doesn't belong to a namespace. Therefore also attributes need to be created, like elements. Creating attributes is done with <xsl:attribute> element.
  2. A valid XML document cannot contain an element that has two or more attributes with same expanded name but elements can contain multiple attributes with same local name if the attributes have different namespaces. This means that removing the namespace prefix from an attribute name will cause dataloss if there is an element that has at leas two attributes with same local name. Other one of these attributes will be removed (or overwritten).

...and the code:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes" method="xml" encoding="utf-8" omit-xml-declaration="yes"/>

    <!-- Stylesheet to remove all namespaces from a document -->
    <!-- NOTE: this will lead to attribute name clash, if an element contains
        two attributes with same local name but different namespace prefix -->
    <!-- Nodes that cannot have a namespace are copied as such -->

    <!-- template to copy elements -->
    <xsl:template match="*">
        <xsl:element name="{local-name()}">
            <xsl:apply-templates select="@* | node()"/>
        </xsl:element>
    </xsl:template>

    <!-- template to copy attributes -->
    <xsl:template match="@*">
        <xsl:attribute name="{local-name()}">
            <xsl:value-of select="."/>
        </xsl:attribute>
    </xsl:template>

    <!-- template to copy the rest of the nodes -->
    <xsl:template match="comment() | text() | processing-instruction()">
        <xsl:copy/>
    </xsl:template>

</xsl:stylesheet>

You could also use <xsl:template match="node()"> instead of that last template but then you should use priority attribute to prevent elements matching to this template.