I am using pdfminer to extract data from pdf files using python. I would like to extract all the data present in pdf irrespective of wheather it is an image or text or whatever it is. Can we do that in a single line(or two if needed, without much work). Any help is appreciated. Thanks in advance
Can we do that in a single line(or two if needed, without much work).
No, you cannot. Pdfminer is powerful but it's rather low-level.
Unfortunately, the documentation is not exactly exhaustive. I was able to find my way around it thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py
See also this answer, where I give a little more detail.