how to convert HTML to .docx using docx4j?

Jalal Sordo picture Jalal Sordo · Dec 9, 2013 · Viewed 17.5k times · Source

I read some articles about the conversion of html to .docx and I found out that docx4j gives pretty decent results. I wonder if anyone could provide me the following info:

  1. Needed jars and versions.
  2. Sample code for conversion from html to .docx.

Sorry I couldn't post anything I tried because I haven't tried anything on this task yet, although I use Apache POI to convert the bytes[] I get from datatabse to html to output in a rich text editor on a jsf application. Please enlighten me, I'm lost in stress and confusion...!

Answer

JasonPlutext picture JasonPlutext · Dec 9, 2013

To import XHTML, use

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-ImportXHTML</artifactId>
    <version>3.0.0</version>
</dependency>

See further http://www.docx4java.org/blog/2013/11/docx4j-3-0-and-maven/

For sample code, see https://github.com/plutext/docx4j-ImportXHTML/tree/master/src/samples/java/org/docx4j/samples

Note that your input needs to be well-formed XML, so if you have HTML, you'll need to tidy it first (with one of the many java libraries which can do this for you).