I used the following code to read the pdf file, but it does not read it. What could possibly be the reason?
>>> import os
>>> from PyPDF2 import PdfFileReader, PdfFileWriter
>>> path = "/Users/Rahul/Desktop/Dfiles/"
>>> dirs = os.listdir( path )
>>> directory = "/Users/Rahul/Desktop/Dfiles/106_2015_34-76357.pdf"
>>> f = open(directory, 'rb')
>>> reader = PdfFileReader(f)
>>> contents = reader.getPage(0).extractText().split('\n')
>>> f.close()
>>> print contents
The output is [u''] instead of reading the content.
import re
import PyPDF2
pdfFileObj = open('E://drive-download-20171015T225604Z-001/test_case/test2/try/xyz.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print("Number of pages:-"+str(pdfReader.numPages))
num = pdfReader.numPages
i =0
while(i<num):
pageObj = pdfReader.getPage(i)
text=pageObj.extractText()
text1 = text.lower()
for line in text1:
if(re.search("abc",line)):
print(line)
i= i+1
I use it to iterate page by page of pdf and search for key terms in it and process further.