Insert HTML into the Body of an HTMLDocument

Robbie picture Robbie · Aug 12, 2010 · Viewed 13.2k times · Source

This seems like such a simple question, but I'm having such difficulty with it.

Problem:

I have some text to insert into an HTMLDocument. This text sometimes specifies some html as well. E.G.:

Some <br />Random <b>HTML</b>

I'm using HTMLEditorKit.insertHTML to insert it at a specified offset. This works fine, unless the offset is at the begining of the doc (offset = 1). When this is the case the text gets inserted into the head of the document instead of the body.

Example:

editorKitInstance.insertHTML(doc, offset, "<font>"+stringToInsert+"</font>", 0, 0, HTML.Tag.FONT);

I use the font tag so I now what I'm inserting will be in a font tag with no attributes so it won't effect the format. I need to know this because the last parameter, insertTag, is required and I can't know the contents of stringToInsert until runtime. If there is already text in the doc (such as "1234567890") then this is the output:

<html>
  <head>

  </head>
  <body>
    <p style="margin-top: 0">
      1234567890 <font>something <br />Some <br />Random <b>HTML</b></font>
    </p>
  </body>
</html>

However if the offset is 1 and the document is empty this is the result:

<html>
  <head>

<font>Some <br />Random <b>HTML</b></font>
  </head>
  <body>
  </body>
</html>

Other Notes:

  • This is all being done on the innerdocument of a JEditorPane. If there is a better way to replace text in a JEditorPane with potential HTML I would be open to those ideas as well.

Any help would be appreciated. Thanks!

Answer

tigger picture tigger · Aug 19, 2010

There are several things you should know about the internal structure of the HTMLDocument.

  • First of all - the body does not start at position 0. All textual content of the document is stored in an instance of javax.swing.text.AbstractDocument$Content. This includes the title and script tags as well. The position/offset argument of ANY document and editor kit function refers to the text in this Content instance! You have to determine the start of the body element to correctly insert content into the body. BTW: Even though you didn't define a body element in your HTML, it will auto-generated by the parser.
  • Simply inserting at a position tends to have unexpected side effects. You need to know where you want to put the content in relation to the (HTML) elements at this position. E.g. if you have the following text in your document: "...</span><span>..." - there is only one position (referring to the Content instance) for "at the end of the first span", "between the spans" and "at the start of the second span". To solve this problem there are 4 functions in the HTMLDocument API:
    • insertAfterEnd
    • insertAfterStart
    • insertBeforeEnd
    • insertBeforeStart

As a conclusion: for a general solutions, you have to find the BODY element to tell the document to "insertAfterStart" of the body and at the start offset of the body element.

The following snipped should work in any case:

HTMLDocument htmlDoc = ...;
Element[] roots = htmlDoc.getRootElements(); // #0 is the HTML element, #1 the bidi-root
Element body = null;
for( int i = 0; i < roots[0].getElementCount(); i++ ) {
    Element element = roots[0].getElement( i );
    if( element.getAttributes().getAttribute( StyleConstants.NameAttribute ) == HTML.Tag.BODY ) {
        body = element;
        break;
    }
}
htmlDoc.insertAfterStart( body, "<font>text</font>" );

If you're sure that the header is always empty, there is another way:

kit.read( new StringReader( "<font>test</font>" ), htmlDoc, 1 );

But this will throw a RuntimeException, if the header is not empty.

By the way, I prefer to use JWebEngine to handle and render HTML content since it keeps header and content separated, so inserting at position 0 always works.