Generate html with images from pdf using Linux -poppler-utils-pdftohtml

Avijit Majumder picture Avijit Majumder · Apr 28, 2013 · Viewed 8.1k times · Source

Currently I am working with pdftohtml, under CentOS, poppler-utils. The concept is simple - user uploads the PDF file and sees the HTML version of that file. I use the simple command -

$> pdftohtml source.pdf target.html 

but it doesn't work! Later on, I try to create html using complex switch with no frames:

$> pdftohtml -c - noframes source.pdf target.html 

Still no Luck! The problem is - The image of the pdf file (the images are inside of that pdf file) can't appear in html, sometimes, the image overlaps! Any ideas?

Here is the PHP Code -

Add.php

<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<link href="css/style.css" rel="stylesheet" type="text/css"/>
<title>CompleteView</title>
</head>
<body>
 <form method="post" action="save.php" enctype="multipart/form-data">
       <input type="hidden" name="action" value="add">
        <tr class="dark_bgcolor text-content">
         <td align="left" width="20%">Upload</td>
         <td align="left" width="1%">:</td>
         <td align="left">
         <input type="file" name="img_full" class="look" size="50"> 
         (Only .pdf)
         </td>
       </tr>

       <tr class="bottom_bgcolor">
         <td align="center" colspan="3"><input type="submit" name="" value="Upload" class="look"></td>
       </tr>
       </form>
</body>
</html>

Save.php

<?php
$myNewFolderPath=rand();
mkdir($myNewFolderPath);
$fname="full_".uniqid("");
$filename=$fname.'.pdf';
//$uploadpath=SPL_IMG_UPLOADPATH.$filename;
move_uploaded_file($_FILES['img_full']['tmp_name'], $myNewFolderPath.'/'.$filename);
chmod($myNewFolderPath.'/'.$filename, 0777);
echo ('/usr/local/bin/pdftohtml '.$myNewFolderPath.'/'.$filename);
exec('/usr/local/bin/pdftohtml -c -noframes'.$myNewFolderPath.'/'.$filename);
header('Location:'.$fname.'.html');
//exec('/usr/local/bin/pdftohtml 2098602105/EssentialC.pdf');
?>

One More thing - the pdftohtml version is -0.36

Here is The Screenshots -

enter image description here

Result - enter image description here

Answer

Justin picture Justin · Aug 14, 2013
$ pdftohtml -c source.pdf target.html 

This will output in complex mode. You can't use -noframes with the complex flag.

$ man pdftohtml

 -noframes   generate no frames. Not supported in complex output mode.