How can I read pdf in python?

sg1994 picture sg1994 · Aug 21, 2017 · Viewed 184.3k times · Source

How can I read pdf in python? I know one way of converting it to text, but I want to read the content directly from pdf.

Can anyone explain which module in python is best for pdf extraction

Answer

shankarj67 picture shankarj67 · Aug 21, 2017

You can USE PyPDF2 package

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open('example.pdf', 'rb')

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/