How to convert PDF to HTML?

Luchian Grigore picture Luchian Grigore · Dec 3, 2011 · Viewed 33.4k times · Source

Is there a proper library which I can use to convert PDF to HTML or some other format that can be converted to HTML easily?

I searched similar questions, but to no luck.

I want to be able to extract text from PDF's, possibly images. I'm not looking to embed the PDF inside the HTML.

Answer

moof2k picture moof2k · Nov 27, 2016

If you're on Linux, try pdftohtml:

sudo apt-get install poppler-utils
pdftohtml -enc UTF-8 -noframes infile.pdf outfile.html

On MacOS (with homebrew) pdftohtml can be installed with:

brew install pdftohtml

The open source ebook converter Calibre can also convert PDF files to HTML and is available on MacOS, Windows and Linux.