Working on tables in pdf using python

python pdf pdf-scraping

sam · Mar 20, 2012 · Viewed 12.4k times · Source

I am working on a pdf file. There is number of tables in that pdf.
According to the table names given in the pdf, I wanted to fetch the data from that table using python.

I have worked on html, xlm parsing but never with pdf.
Can anyone tell me how to fetch tables from pdf using python?

Answer

I think that you need a python parser library. The most famous is PDFMiner.

According to the documentation :

PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.

Working on tables in pdf using python

Answer

Related questions