I have a pdf
file including form fields and need to export the data into a xml
file AUTOMATICALLY. Here is a screen of a sample form I created for testing:
Note: It works great exporting it MANUALLY using Acrobat Professional by clicking on Tools > Form > Export Form Data
and finally chose xml extension for file output. This is the result I'm getting when I export it manually:
<?xml version="1.0" encoding="UTF-8"?>
<fields>
<first_name>John</first_name>
<last_name>Doe</last_name>
</fields>
However, I need to automate it, e.g. with a python script, Java implementation or some command line tools. Any ideas which libraries or tools I could use to export form field data to xml
? The tool or library should be open source, that I can integrate it in my workflow.
I already tried python pdfminer
library, which helped me to export static parts (like Static form header
, First name:
and Last name:
) of the pdf file: But how to export form field data (in my case the content of the form fields first_name
and last_name
)??
EDIT: Feel free to download the sample.pdf file here.
How about Apache PDFBox? It is open source and could fit your needs, since the website says "Extract forms data from PDF forms or prefill a PDF form."
EDIT: Check out the PrintFields example.