How to export pdf form fields to xml automatically

Michael picture Michael · Jan 9, 2014 · Viewed 22.3k times · Source

I have a pdf file including form fields and need to export the data into a xml file AUTOMATICALLY. Here is a screen of a sample form I created for testing:

enter image description here

Note: It works great exporting it MANUALLY using Acrobat Professional by clicking on Tools > Form > Export Form Data and finally chose xml extension for file output. This is the result I'm getting when I export it manually:

<?xml version="1.0" encoding="UTF-8"?>
<fields>
    <first_name>John</first_name>
    <last_name>Doe</last_name>
</fields>

However, I need to automate it, e.g. with a python script, Java implementation or some command line tools. Any ideas which libraries or tools I could use to export form field data to xml? The tool or library should be open source, that I can integrate it in my workflow.

I already tried python pdfminer library, which helped me to export static parts (like Static form header, First name: and Last name:) of the pdf file: But how to export form field data (in my case the content of the form fields first_name and last_name)??

EDIT: Feel free to download the sample.pdf file here.

Answer

jimmyp.smith picture jimmyp.smith · Jan 23, 2014

How about Apache PDFBox? It is open source and could fit your needs, since the website says "Extract forms data from PDF forms or prefill a PDF form."

EDIT: Check out the PrintFields example.