Extracting entire pdf data with python pdfminer

python pdf-reader

sunil reddy · Jun 8, 2013 · Viewed 10.8k times · Source

I am using pdfminer to extract data from pdf files using python. I would like to extract all the data present in pdf irrespective of wheather it is an image or text or whatever it is. Can we do that in a single line(or two if needed, without much work). Any help is appreciated. Thanks in advance

Answer

Can we do that in a single line(or two if needed, without much work).

No, you cannot. Pdfminer is powerful but it's rather low-level.

Unfortunately, the documentation is not exactly exhaustive. I was able to find my way around it thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

See also this answer, where I give a little more detail.

Extracting entire pdf data with python pdfminer

Answer

Related questions