Convert HTML to DOCX

MrWayne picture MrWayne · Oct 10, 2014 · Viewed 20k times · Source

My question is very specific and I hope that someone has done this conversion from HTMLto DOCX.

To do this I took a sample code from github and tried it in my local Eclipse Setup.

import java.io.File;
import java.io.FileNotFoundException;

import javax.xml.bind.JAXBException;

import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart;

public class HtmlToDocConvert {

    /**
     * @param args
     * @throws FileNotFoundException
     * @throws JAXBException
     * @throws Docx4JException
     */
    public static void main(String[] args) throws FileNotFoundException,
            JAXBException, Docx4JException {
        // TODO Auto-generated method stub

        // File file = new File("C:\\TestWordToHtml\\html\\Test.html");

        String inputfilepath = "C:\\TestWordToHtml\\html\\Test.html";

        try {

            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .createPackage();

            NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
            wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
            ndp.unmarshalDefaultNumbering();

            XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(
                    wordMLPackage);
            xHTMLImporter.setHyperlinkStyle("Hyperlink");
            wordMLPackage.getMainDocumentPart().getContent().addAll(
                    xHTMLImporter.convert(new File(inputfilepath), null));

            File output = new java.io.File(System.getProperty("user.dir")
                    + "/html_output.docx");
            wordMLPackage.save(output);
            System.out.println("done");

            System.out.println("file path where it is stored is" + " "
                    + output.getAbsolutePath());

        }

        catch (InvalidFormatException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

Above code is giving me an error as follows

Exception in thread "main" java.lang.NoSuchMethodError: org.docx4j.org.xhtmlrenderer.docx.DocxRenderer.(Ljava/lang/String;)V at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.getRenderer(XHTMLImporterImpl.java:252) at org.docx4j.convert.in.xhtml.XHTMLImporterImpl.convert(XHTMLImporterImpl.java:466) at HtmlToDocConvert.main(HtmlToDocConvert.java:41)

Jars in my projects to achieve this are as following.

docx4j-3.2.1.jar
docx4j-ImportXHTML-3.2.1.jar
slf4j-api-1.7.7.jar
slf4j-log4j12-1.7.7.jar
xhtmlrenderer-1.0.0.jar
log4j.jar

I have stripped the xhtmlrendere.jar file to view DOCRendered class and saw that there was no init method inside it.I have spent close to half a day to figure out this thing and I am not sure if this is correct way to do the conversion or this is even possible.

If someone has done this can he/she sent me correct xhtmlrenderer.jar file or anypother dependency to achieve this simple task.

Thanks in Advance

Regards, Bhanu

Answer

Alex Tape picture Alex Tape · Oct 10, 2014

This is not the complete example, is it? Just take a look at ConvertInXHTMLFile.java from docx4j examples.

IMHO you are missing basic parts of the procedure. Furthermore, this topic has been discussed already:

Convert html to doc in java

How to convert HTML to a Microsoft Word document ?

Convert HTML to Microsoft Word Document in Java

how to convert HTML to .docx using docx4j?