Remove a page number, header and footer from pdf file

Deepti Kakade picture Deepti Kakade · Jan 12, 2015 · Viewed 8.2k times · Source

I want to parse a pdf file, for that I am using pdftotext utility which converts pdf file into text file, now I want to remove a page number, header and footer from text file.

I am converting a pdf file using following syntax:

pdftotext -layout input.pdf output.txt

Can anyone help me on this?

Answer

Reinaldo Gil picture Reinaldo Gil · Jan 26, 2016

You need crop with params -H -W -y -x, as least -H -W -y.

Example:

pdftotext -y 80 -H 650 -W 1000 -nopgbrk -eol unix example.pdf


-y 80   -> crop 80 pixels after the top of file (remove header);
-H 650  -> crop 650 pixels after the -y has cropped (remove footer);
-W 1000 -> hight value to crop nothing (need especify something);

You need adjust -y and -H to each PDF, sometimes reducing -y and increasing -H to fit with the header and footer;