How to generate a valid PDF/A file using iText and XMLWorker (HTML to PDF/A process)

Arturo picture Arturo · Sep 1, 2014 · Viewed 7.6k times · Source

I'm currently developing a method that will accept HTML input and convert it into a valid PDF/A file. I know how to programmatically construct a valid PDF/A file using iText (reference: http://itextsupport.com/download/pdfa3.html) but I'm unable to generate a valid PDF/A file using HTML as input and using XMLWorker to transform this input into a PDF file. The problem that I have right now is due to the embedded fonts requirement of the PDF/A format. I always get this exception:

Exception in thread "main" com.itextpdf.text.pdf.PdfAConformanceException: All the fonts must be embedded. This one isn't: Helvetica

I try to force which fonts will the HTML input use via a CSS file and I register the fonts I want to use in the output PDF file via the XMLWorkerFontProvider class, but it seems I'm doing something wrong because the exception commented above is always thrown.

What else do I need in order to XMLWorker uses the fonts registered via XMLWorkerFontProvider class? I want to avoid the use of the default font Helvetica in every HTML element present in the input.

Below is the code I'm using for testing:

style.css (just 1 line):

* { font: normal 100% Arial, sans-serif !important; }

Main.java:

package com.itextpdf;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.Reader;
import java.io.StringReader;

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.ICC_Profile;
import com.itextpdf.text.pdf.PdfAConformanceLevel;
import com.itextpdf.text.pdf.PdfAWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerFontProvider;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFile;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.CssAppliers;
import com.itextpdf.tool.xml.html.CssAppliersImpl;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;

public class Main {

    /**
     * @param args
     */
    public static void main(String[] args) {

        StringBuffer buf = new StringBuffer();

        buf.append("<!DOCTYPE html>");
        buf.append("<html>");
        buf.append("<head>");
        buf.append("<title>Test</title>");
        buf.append("</head>");
        buf.append("<body>");
        buf.append("<p>This is a test</p>");
        buf.append("</body>");
        buf.append("</html>");

        OutputStream file = null;
        Document document = null;
        PdfAWriter writer = null;

        try {

            file = new FileOutputStream(new File("C:\\Users\\amartin\\Desktop\\Test.pdf"));
            document = new Document();
            writer = PdfAWriter.getInstance(document, file, PdfAConformanceLevel.PDF_A_1B);

            // Create XMP metadata. It's a PDF/A requirement.
            writer.createXmpMetadata();

            document.open();

            // Set output intent. PDF/A requirement.
            ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream("./src/main/resources/com/itextpdf/sRGB Color Space Profile.icm"));
            writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);

            // CSS
            CSSResolver cssResolver = new StyleAttrCSSResolver();
            CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream("./css/style.css"));
            cssResolver.addCss(cssFile);

            XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
            fontProvider.register("./fonts/arial.ttf");
            fontProvider.register("./fonts/sans-serif.ttf");
            fontProvider.addFontSubstitute("lowagie", "garamond");

            CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
            HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
            htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

            // Pipelines
            PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
            HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
            CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

            XMLWorker worker = new XMLWorker(css, true);
            XMLParser p = new XMLParser(worker);

            Reader reader = new StringReader(buf.toString());
            p.parse(reader);

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            if (document != null && document.isOpen())
                document.close();

            try {

                if (file != null)
                    file.close();

            } catch (IOException e) {}

            if (writer != null && !writer.isCloseStream())
                writer.close();

        }

    }

}

edit:

Answering to Bruno, I have extended the FontFactoryImp class overriding the getFont() method (the one that has all the arguments). It calls the the System.out.println function like this:

System.out.println("=fontname: " + fontname + " =encoding: " + encoding + " =embedded : " + embedded + " =size: " + size + " =style: " + style + " =BaseColor: " + color)

and then calls parent.getFont() method with the same arguments. The only output I see is this:

=fontname: null =encoding: Cp1252 =embedded : true =size: -1.0 =style: -1 =BaseColor: null =fontname: null =encoding: Cp1252 =embedded : true =size: -1.0 =style: -1 =BaseColor: null

and the exception thrown, pasted before this code.

Answer

Bruno Lowagie picture Bruno Lowagie · Sep 1, 2014

Based on the feedback you're sending to the System.out, it seems that XML Worker doesn't pick up the font family you want to use.

Please specify the font family like this:

font-family: "Arial"

Using 'font' in CSS may work, but it's tricky. I think iText sees normal and interprets it as Use the default font.