Deals with extracting useful information from PDF content (for example, text or images)
I have a PDF which contains Tables, text and some images. I want to extract the table wherever tables are …
python pdf pdf-parsingAre there any open source libraries that support table identification & extraction? By this I mean: Identify a table structure …
python pdf scrape pdf-parsing pdf-scrapingI have a stack of PDFs - potentially hundreds or thousands. They are not all formatted the same, but any …
parsing pdf extract pdf-parsingI'm trying to figure out a good way to increase the productivity of my data entry job. What I am …
excel pdf ocr screen-scraping pdf-parsingI'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX). …
ruby-on-rails ruby pdf pdf-parsingI am using meteor-react for uploading PDF docs to my Node.js backend, where I want to read the uploaded …
node.js pdf-parsingWe are developing a Pdf parser to be used along with our system. The requirement is such that, we store …
pdf licensing itextsharp itext pdf-parsingI'm trying to get the data from the tables in this PDF. I've tried pdfminer and pypdf with a little …
python python-2.7 ocr pdfminer pdf-parsingI'm trying to extract text from a large number of PDFs using PDFMiner python bindings. The module I wrote works …
python pypdf pdf-parsing pdf-manipulationI have about 1'500 PDFs consisting of only 1 page each, and exhibiting the same structure (see http://files.newsnetz.ch/…
python node.js parsing web-scraping pdf-parsing