read, highlight, save PDF programmatically

Jake picture Jake · Sep 30, 2011 · Viewed 7.9k times · Source

I'd like to write a small script (which will run on a headless Linux server) that reads a PDF, highlights text that matches anything in an array of strings that I pass, then saves the modified PDF. I imagine I'll end up using something like the python bindings to poppler but unfortunately there's next to zero documentation and I have next to zero experience in python.

If anyone could point me to a tutorial, example, or some helpful documentation to get me started it would be greatly appreciated!

Answer

Albert Perrien picture Albert Perrien · Sep 30, 2011

Have you tried looking at PDFMiner? It sounds like it does what you want.