How to convert html to doc in php

Mohit Jain picture Mohit Jain · Feb 12, 2011 · Viewed 31k times · Source

I need to convert a html file to doc. I am using html2pdf for pdf conversion.

Is there is any same kind of library for html2doc?

(PS must be free/open source)

EDIT

After Mark Eirich comment..

Here are two screenshots. Word document is not proper aligned. Check y-scroll in word document. WORD document, check y scroll..

html file, on browser..

Body tag is:--

<body style="margin-left:350px; margin-right:350px;">

I tried to adjust it but no effect..

EDIT 2

after Mark Eirich second comment i came to know word is taking things in pixel not in %age.. I am having last issue of back ground.. Any help.. please check the two screen shots. The difference is outer box. and thats y html generated doc is looking odd.

Original word file

html generated doc file

Answer

RobertPitt picture RobertPitt · Feb 12, 2011

The answer IMO Would be no, For the following reasons:

Microsoft Office Documents are extremely complex in the way they are designed, there not just a formatted file with references to objects such as images, there is a type od file system within itself to manage the binary data of these objects.

Let me bring in a quote from our very own Joel:

If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly. A normal programmer would conclude that Office’s binary file formats:

  • are deliberately obfuscated
  • are the product of a demented Borg mind
  • were created by insanely bad programmers
  • and are impossible to read or create correctly.

You’d be wrong on all four counts....

Read further down for a possible solution:

If you really want to generate fancy formatted Word documents, your best bet is to create an RTF document. Everything that Word can do can be expressed in RTF, but it’s a text format, not binary, so you can change things in the RTF document and it’ll still work. You can create a nicely formatted document with placeholders in Word, save as RTF, and then using simple text substitution, replace the placeholders on the fly. Now you have an RTF document that every version of Word will open happily.

@source: http://www.joelonsoftware.com/items/2008/02/19.html

Some links that may interest you along your journey:

Although, Try opening a word file with winrar ;), Maybe creating an archive with certain headers and then changing the extenstion will suffice, Never Tried