Print contents of a PDF to the command line

andronikus picture andronikus · Oct 11, 2011 · Viewed 7k times · Source

I'm looking for a command-line program that will print out the text of a PDF file, just like cat for a text file. I'm pretty sure that such a thing exists because I remember using it a few months ago. I could have sworn it was pdfcat, but that's just a merging utility. I've found pdftotxt, and that would be workable, but I'd prefer something that replicates the cat functionality because I want to pipe to grep. Thanks!

Answer

jsvk picture jsvk · Oct 11, 2011

on the man pages for pdftotext, I found this:

pdftotext [options] [PDF-file [text-file]]

Description Pdftotext converts Portable Document Format (PDF) files to plain text.

Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout.

So if your goal is to output to stdout in order to pipe to grep, pdftotext mydoc.pdf - should work just like cat mytext.txt, and therefore pdftotext mydoc.pdf - | grep mysearchterm