I am working on a pdf file. There is number of tables in that pdf.
According to the table names given in the pdf, I wanted to fetch the data from that table using python.
I have worked on html, xlm parsing but never with pdf.
Can anyone tell me how to fetch tables from pdf using python?
I think that you need a python parser library. The most famous is PDFMiner.
According to the documentation :
PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.