Crop PDF & add margins

Jordan Reiter picture Jordan Reiter · May 17, 2013 · Viewed 7.3k times · Source

I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.

Is this possible using PDFBox or another Java class?

Answer

John Pink picture John Pink · Oct 2, 2015

Have you found an answer to your problem ? I have been facing the same scenario this week.

I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.

I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).

I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !

Here is my code by the way :

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;

import java.awt.Color;
import java.io.*;
import java.util.List;

// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {

    // Arbitrary values obtained in a very obscure way
    static int PAGE_WIDTH = 615;
    static int PAGE_HEIGHT = 815;

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws IOException, COSVisitorException {

        File inputFile = new File("C:\\input.pdf");
        File outputFile = new File("C:\\output.pdf");

        PDDocument inputDoc = PDDocument.load(inputFile);
        PDDocument outputDoc = new PDDocument();

        List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();

        PDPageContentStream pageCS = null;

        // Lets paint our pages white !
        for (PDPage page : pages) {
            pageCS = new PDPageContentStream(inputDoc, page, true, false);
            pageCS.setNonStrokingColor(Color.white);
            // Top rectangle
            pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
            // Bottom rectangle
            pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
            pageCS.close();
            outputDoc.addPage(page);
        }

        // Save to file
        outputFile.delete();
        outputDoc.save(outputFile);

        // Wait until the end to close all documents, or else you get an error
        inputDoc.close();
        outputDoc.close();
    }
}