How to search for comments ("<!-- -->") using Jsoup?

87element picture 87element · Sep 24, 2011 · Viewed 11.3k times · Source

I would like to remove those tags with their content from source HTML.

Answer

dlamblin picture dlamblin · Sep 24, 2011

When searching you basically use Elements.select(selector) where selector is defined by this API. However comments are not elements technically, so you may be confused here, still they are nodes identified by the node name #comment.

Let's see how that might work:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;

public class RemoveComments {
    public static void main(String... args) {
        String h = "<html><head></head><body>" +
          "<div><!-- foo --><p>bar<!-- baz --></div><!--qux--></body></html>";
        Document doc = Jsoup.parse(h);
        removeComments(doc);
        doc.html(System.out);
    }

    private static void removeComments(Node node) {
        for (int i = 0; i < node.childNodeSize();) {
            Node child = node.childNode(i);
            if (child.nodeName().equals("#comment"))
                child.remove();
            else {
                removeComments(child);
                i++;
            }
        }
    }        
}