How I can traverse the HTML tree using Jsoup?

Renato Dinhani picture Renato Dinhani · Apr 11, 2012 · Viewed 9.7k times · Source

I think this question has been asked, but I not found anything.

From the Document element in Jsoup, how I can traverse for all elements in the HTML content?

I was reading the documentation and I was thinking about using the childNodes() method, but it only takes the nodes from one leval below (what I understand). I think I can use some recursion with this method, but I want to know if there is a more appropriate/native way to do this.

Answer

Vivien Barousse picture Vivien Barousse · Apr 11, 2012

From Document (and any Node subclass), you can use the traverse(NodeVisitor) method.

For example:

document.traverse(new NodeVisitor() {
    public void head(Node node, int depth) {
        System.out.println("Entering tag: " + node.nodeName());
    }
    public void tail(Node node, int depth) {
        System.out.println("Exiting tag: " + node.nodeName());
    }
});