Converting rtf to html with format in Java

dcc picture dcc · Mar 13, 2014 · Viewed 9.1k times · Source

I can use JEditorPane to parse the rtf text and convert it to html. But the html output is missing some format, namely the strike-through markups in this case. As you can see in the output, underline text was correctly wrapped within <u> but there is no strike-through wrapper. Any idea?

public void testRtfToHtml()
{
    JEditorPane pane = new JEditorPane();
    pane.setContentType("text/rtf");

    StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");

    try
    {
        kitRtf.read(
            new StringReader(
                "{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26  } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
            pane.getDocument(), 0);
        kitRtf = null;

        StyledEditorKit kitHtml =
            (StyledEditorKit) pane.getEditorKitForContentType("text/html");

        Writer writer = new StringWriter();
        kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
        System.out.println(writer.toString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

Output:

<html>
  <head>
    <style>
      <!--
        p.Normal {
          RightIndent:0.0;
          FirstLineIndent:0.0;
          LeftIndent:0.0;
        }
      -->
    </style>
  </head>
  <body>
    <p class=default>
              <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
This is supposed to be strike-through.
      </span>
      <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">

      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<u>Underline text here</u>
      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
.?
      </span>

    </p>
  </body>
</html>

Answer

Mike B picture Mike B · Mar 13, 2014

You could try converting with OpenOffice or LibreOffice using this converter library as described in this blog post