PDFBox PDFTextStripperByArea region coordinates

ipavlic picture ipavlic · Dec 15, 2011 · Viewed 11.9k times · Source

In what dimensions and direction is the Rectangle in the

PDFTextStripperByArea's function addRegion(String regionName, Rectangle2D rect).

In other words, where does the rectangle R start and how big is it (dimensions of the origin values, dimensions of the rectangle) and in what direction does it go (direction of the blue arrows in illustration), if new Rectangle(10,10,100,100) is given as a second parameter?

PdfBox rectangle

Answer

Nicolas W. picture Nicolas W. · Jul 16, 2012
new Rectangle(10,10,100,100)

means that the rectangle will have its upper-left corner at position (10, 10), so 10 units far from the left and the top of the PDF document. Here a "unit" is 1 pt = 1/72 inch.

The first 100 represents the width of the rectangle and the second one its height. To sum up, the right picture is the first one.

I wrote this code to extract some areas of a page given as arguments to the function:

Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;

stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);

So, x and y are the absolute coordinates of the upper-left corner of the Rectangle and then you specify its width and height. page is a PDPage variable given as argument to this function.