I can not detect blank page in pdf file. I have searched internet for it but could not find a good solution.
Using Itextsharp I tried with page size, Xobjects. But they do not give exact result.
I tried
if(xobjects==null || textcontent==null || size <20 bytes )
then "blank"
else
not blank
But maximum time it returns wrong answer. I have used Itextsharp
The code is below... I am using Itextsharp Librabry
For xobjects
PdfDictionary xobjects = resourceDic.GetAsDict(PdfName.XOBJECT);
//here resourceDic is PdfDictionary type
//I know that if Xobjects is null then page is blank. But sometimes blank page gives xobjects which is not null.
For contentstream
RandomAccessFileOrArray f = reader.SafeFile;
//here reader = new PdfReader(filename);
byte[] contentBytes = reader.GetPageContent(pageNum, f);
//I have measured the size of contentbytes but sometimes it gives more than 20 bytes for blank page
For textcontent
String extractedText = PdfTextExtractor.GetTextFromPage(reader, pageNum, new LocationTextExtractionStrategy());
// sometimes blank page give a text more than 20 char length .
A very simple way to discover empty pages is this: use a Ghostscript commandline that calls the bbox
device.
Ghostscript's bbox calculates the coordinates of that minimum rectangle 'bounding box' which encloses all points of the page where a pixel would be rendered:
gs \
-o /dev/null \
-sDEVICE=bbox \
input.pdf
On Windows:
gswin32c.exe ^
-o nul ^
-sDEVICE=bbox ^
input.pdf
Result:
GPL Ghostscript 9.05 (2012-02-08)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 6.
Page 1
%%BoundingBox: 27 281 548 804
%%HiResBoundingBox: 27.000000 281.000000 547.332031 804.000000
Page 2
%%BoundingBox: 0 0 0 0
%%HiResBoundingBox: 0.000000 0.000000 0.000000 0.000000
Page 3
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 4
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 5
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 6
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
As you can see, page 2 of my input document was empty.