I am trying to open a pdf to get the number of pages. I am using PyPDF2.
Here is my code:
def pdfPageReader(fileName):
try:
pdf_file = open(fileName, 'rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file, strict=True)
number_of_pages = read_pdf.getNumPages()
print(str(fileName) + " = " + str(number_of_pages))
pdf_file.close()
return number_of_pages
except:
return "1"
But then i run into this error:
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
I tried to use strict=True and strict=False, When it is True, it displays this message, and nothing, I waited for 30minutes, but nothing happened. When it is False, it just display nothing, and that's it, just do nothing, if I press ctrl+c on the terminal (cmd, windows 10) then it cancel that open and continues (I run this in a batch of pdf files). Only 1 in the batch got this problem.
My questions are, how do I fix this, or how do I skip this, or how can I cancel this and move on with the other pdf files?
If somebody had a similar problem and it even crashed the program with this error message
File "C:\Programy\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1604, in getObject % (indirectReference.idnum, indirectReference.generation, idnum, generation)) PyPDF2.utils.PdfReadError: Expected object ID (14 0) does not match actual (13 0); xref table not zero-indexed.
It helped me to add the strict argument equal to False for my pdf reader
pdf_reader = PdfFileReader(input_file,strict=False)