How to read the pdf file using selenium

Bugasur picture Bugasur · Nov 22, 2016 · Viewed 27.1k times · Source

I am working on web page over which there is a link, clicking on which it opens a pdf file on new window. I have to read that pdf file to validate some data against the transactions done. One way is to download that file and then use it. Can any one help me out on this. I have to work on IE 11

Thanks in Advance.

Answer

Kenil Fadia picture Kenil Fadia · Nov 25, 2016

Use PDFBox and FontBox.

    public String readPDFInURL() throws EmptyFileException, IOException {
        WebDriver driver = new FirefoxDriver();
        // page with example pdf document
        driver.get("file:///C:/Users/admin/Downloads/dotnet_TheRaceforEmpires.pdf");
        URL url = new URL(driver.getCurrentUrl());
        InputStream is = url.openStream();
        BufferedInputStream fileToParse = new BufferedInputStream(is);
        PDDocument document = null;
        try {
            document = PDDocument.load(fileToParse);
            String output = new PDFTextStripper().getText(document);
        } finally {
            if (document != null) {
                document.close();
            }
            fileToParse.close();
            is.close();
        }
        return output;
    }

Since some of the functions from the older versions of PDFBox have been deprecated, we need to use another FontBox along with PDFBox. I have used PDFBox (2.0.3) and FontBox (2.0.3) and it is working fine. It won't read images though.