Converting doc, docx, pdf to HTML using PHP linux

sam picture sam · May 13, 2011 · Viewed 7.6k times · Source

i run a job search site, and i need to convert doc, docx and pdf files into HTML on linux CentOS server running php. People submit these files as resumes. So far, I found PHPDocx to be great at converting docx to html. But I am stuck at doc/pdf. PDFTOHTML gives error "bad color" when i run tests. As far as doc, i only found wvwave, which seems complex and bulky to install.

does anyone have any ideas on how to easily convert doc/pdf to HTML?

Answer

Ch33f picture Ch33f · Aug 20, 2013

The only thing i can think of is FPDF. It is intended for creating PDF files in PHP but it can also open PDF files. Maybe you can use that as a base and develop some sort of toHTML function for it.

It is completely free to use and it has some extensions already. It MIGHT help you.

http://www.fpdf.org

EDIT: Thanks for the addition to my post in the comments to Pierre:

You can use fpdi: http://www.setasign.de/products/pdf-php-solutions/fpdi but the input pdf is just like an image.

I havent taken a look at it myself so far but this might help.