I want to parse a pdf file, for that I am using pdftotext
utility which converts pdf file into text file, now I want to remove a page number, header and footer from text file.
I am converting a pdf file using following syntax:
pdftotext -layout input.pdf output.txt
Can anyone help me on this?
You need crop with params -H -W -y -x, as least -H -W -y.
Example:
pdftotext -y 80 -H 650 -W 1000 -nopgbrk -eol unix example.pdf
-y 80 -> crop 80 pixels after the top of file (remove header);
-H 650 -> crop 650 pixels after the -y has cropped (remove footer);
-W 1000 -> hight value to crop nothing (need especify something);
You need adjust -y and -H to each PDF, sometimes reducing -y and increasing -H to fit with the header and footer;