Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate which uses pypdf but the text generated had no space between and was of no use.
Try PDFMiner. It can extract text from PDF files as HTML, SGML or "Tagged PDF" format.
The Tagged PDF format seems to be the cleanest, and stripping out the XML tags leaves just the bare text.
A Python 3 version is available under: