I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX).
Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for.
My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem?
You might find Docsplit useful:
Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)