Form field values set with PDFBOX not visible in Adobe Reader

SpaceGerbil picture SpaceGerbil · Jun 10, 2014 · Viewed 7.8k times · Source

I am having an issue with trying to set some from fields using Apache PDFBOX(1.8.5). I have a few different Static PDFs that I am using for testing. Using the following code, I can set the values of form fields, and save the resulting PDF. I can then open this PDF in Adobe Reader and see the results:

PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
pdfTemplate.setAllSecurityToBeRemoved(true);
PDAcroForm acroForm = docCatalog.getAcroForm();
List fields = acroForm.getFields();     
Iterator fieldsIter = fields.iterator();        
while( fieldsIter.hasNext())
{
    PDField field = (PDField)fieldsIter.next();         
    if(field instanceof PDTextbox){
        ((PDTextbox)field).setValue("STATIC PDFBOX EDIT");
    }
}

And then I eventually save the form. For Static PDFs of:

  • PDF Version: 1.6 (Acrobat 7.x)
  • PDF Version: 1.7 (Acrobat 8.x)

This works just fine. I can open the Documents in Adobe Reader XI and see the correct values in the form.

For Static PDFs of:

  • PDF Version: 1.7 Adobe Extension Level 3(Acrobat 9.x)
  • PDF Version: 1.7 Adobe Extension Level 8(Acrobat X)
  • PDF Version: 1.7 Adobe Extension Level 11(Acrobat XI)

This appears to not be working. When I open the resulting forms in Adobe Reader XI, the fields do not appear to be populated. But If I open the PDF in my Firefox or Chrome browser's PDF viewer, the fields show as populated there.

How can I set these fields so the values will appear when viewed in Adobe Reader XI?

EDIT: Sample PDFs can be found here: https://github.com/bamundson/PDFExample

Answer

mkl picture mkl · Jun 12, 2014

The major difference between your PDFs is the form technology used:

  • Test_9.pdf uses good ol'fashioned AcroForm forms;
  • Test_10.pdf and Test_10.pdf on the other hand use a hybrid form with both an AcroForm representation and a XFA (Adobe XML Forms Architecture) representation.

XFA-aware PDF viewers (i.e. foremost Adobe Reader and Adobe Acrobat) use the XFA information from the file while XFA-unaware viewers (i.e. most others) use the AcroForm information.

PDFBox is mostly XFA-unaware. This means especially that the PDField objects returned by PDAcroForm.getFields() only represent the AcroForm information. Thus, your ((PDTextbox)field).setValue("STATIC PDFBOX EDIT") calls only influence the AcroForm representation of the form.

This explains your observation

When I open the resulting forms in Adobe Reader XI, the fields do not appear to be populated. But If I open the PDF in my Firefox or Chrome browser's PDF viewer, the fields show as populated there.

(As far as I know Firefox and Chrome integrated PDF viewers are XFA-unaware.)

So,

How can I set these fields so the values will appear when viewed in Adobe Reader XI?

There essentially are two ways:

  1. Remove the XFA entry from the AcroForm dictionary:

    acroForm.setXFA(null);
    

    If there is no XFA, Adobe Reader will use the AcroForm form information, too.

  2. Edit both the AcroForm and the XFA information. You can retrieve the XFA information using

    PDXFAResource xr = acroForm.getXFA();
    

    and extract the underlying XML using

    xr.getDocument()
    

    Then you can edit the XML, put the resulting XML into a stream which you can wrap in a PDXFAResource which you then can set using AcroForm.setXFA(...).

While option 1 certainly is much easier to implement, it only works for hybrid documents. If you also will have to edit pure XFA forms, you'll need to implement option 2.

Writing new field values to these PDFs works fine with the latest version of iText

iText has a certain degree of explicit support for XFA forms.