Efficient XSLT pipeline in Java (or redirecting Results to Sources)

Chris Scott picture Chris Scott · Aug 21, 2009 · Viewed 7.6k times · Source

I have a series of XSL 2.0 stylesheets that feed into each other, i.e. the output of stylesheet A feeds B feeds C.

What is the most efficient way of doing this? The question rephrased is: how can one efficiently route the output of one transformation into another.

Here's my first attempt:

@Override
public void transform(Source data, Result out) throws TransformerException{
    for(Transformer autobot : autobots){
        if(autobots.indexOf(autobot) != (autobots.size()-1)){
            log.debug("Transforming prelim stylesheet...");
            data = transform(autobot,data);
        }else{
            log.debug("Transforming final stylesheet...");
            autobot.transform(data, out);
        }
    }
}

private Source transform(Transformer autobot, Source data) throws TransformerException{
    DOMResult result = new DOMResult();
    autobot.transform(data, result);
    Node node = result.getNode();
    return new DOMSource(node);
}

As you can see, I'm using a DOM to sit in between transformations, and although it is convenient, it's non-optimal performance wise.

Is there any easy way to route to say, route a SAXResult to a SAXSource? A StAX solution would be another option.

I'm aware of projects like XProc, which is very cool if you haven't taken a look at yet, but I didn't want to invest in a whole framework.

Answer

Mads Hansen picture Mads Hansen · Aug 24, 2009

I found this: #3. Chaining Transformations that shows two ways to use the TransformerFactory to chain transformations, having the results of one transform feed the next transform and then finally output to system out. This avoids the need for an intermediate serialization to String, file, etc. between transforms.

When multiple, successive transformations are required to the same XML document, be sure to avoid unnecessary parsing operations. I frequently run into code that transforms a String to another String, then transforms that String to yet another String. Not only is this slow, but it can consume a significant amount of memory as well, especially if the intermediate Strings aren't allowed to be garbage collected.

Most transformations are based on a series of SAX events. A SAX parser will typically parse an InputStream or another InputSource into SAX events, which can then be fed to a Transformer. Rather than having the Transformer output to a File, String, or another such Result, a SAXResult can be used instead. A SAXResult accepts a ContentHandler, which can pass these SAX events directly to another Transformer, etc.

Here is one approach, and the one I usually prefer as it provides more flexibility for various input and output sources. It also makes it fairly easy to create a transformation chain dynamically and with a variable number of transformations.

SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();

// These templates objects could be reused and obtained from elsewhere.
Templates templates1 = stf.newTemplates(new StreamSource(
  getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
  getClass().getResourceAsStream("MyStylesheet1.xslt")));

TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);

th1.setResult(new SAXResult(th2));
th2.setResult(new StreamResult(System.out));

Transformer t = stf.newTransformer();
t.transform(new StreamSource(System.in), new SAXResult(th1));

// th1 feeds th2, which in turn feeds System.out.