Xref table not zero-indexed. ID numbers for objects will be corrected. won't continue

JBin picture JBin · Apr 20, 2018 · Viewed 16.5k times · Source

I am trying to open a pdf to get the number of pages. I am using PyPDF2.

Here is my code:

def pdfPageReader(fileName):
    try:
        pdf_file = open(fileName, 'rb')
        read_pdf = PyPDF2.PdfFileReader(pdf_file, strict=True)
        number_of_pages = read_pdf.getNumPages()
        print(str(fileName) + " = " + str(number_of_pages))
        pdf_file.close()
        return number_of_pages
    except:
        return "1"

But then i run into this error:

PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]

I tried to use strict=True and strict=False, When it is True, it displays this message, and nothing, I waited for 30minutes, but nothing happened. When it is False, it just display nothing, and that's it, just do nothing, if I press ctrl+c on the terminal (cmd, windows 10) then it cancel that open and continues (I run this in a batch of pdf files). Only 1 in the batch got this problem.

My questions are, how do I fix this, or how do I skip this, or how can I cancel this and move on with the other pdf files?

Answer

DovaX picture DovaX · Jan 30, 2020

If somebody had a similar problem and it even crashed the program with this error message

File "C:\Programy\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1604, in getObject % (indirectReference.idnum, indirectReference.generation, idnum, generation)) PyPDF2.utils.PdfReadError: Expected object ID (14 0) does not match actual (13 0); xref table not zero-indexed.

It helped me to add the strict argument equal to False for my pdf reader

pdf_reader = PdfFileReader(input_file,strict=False)