Why getChild() method of JDOM returns null?

Arun picture Arun · Mar 10, 2011 · Viewed 13.3k times · Source

I'm doing a project regarding html document manipulation. I want body content from existing html document to modify it into a new html.Now i'm using JDOM. i want to use body element in my coding.For that i used getChild("body") in my coding.But it returns null to my program.But my html document have a body element.Could anybody help me to know this problem as i'm a student?

would appreciate pointers..

Coding:

import org.jdom.Document;
import org.jdom.Element;
public static void getBody() {
SAXBuilder builder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", true);
org.jdom.Document jdomDocument=builder.build("http://www......com");
Element root = jdomDocument.getRootElement();
      //It returns null
System.out.println(root.getChild("body"));
}

please refer these too.. My html's root and childs printed in console...

root.getName():html

SIZE:2

[Element: <head [Namespace: http://www.w3.org/1999/xhtml]/>]

[Element: <body [Namespace: http://www.w3.org/1999/xhtml]/>]

Answer

javanna picture javanna · Mar 10, 2011

I've found some problems in your code: 1) if you want to build a remote xml through the net, you should user another build method which receives an URL as input. Actually you're parsing the file with name "www......com" as an xml.

Document jdomDocument = builder.build( new URL("http://www........com"));

2) if you want to parse an html page as xml, you have to check that it is a well formed xhtml document, otherwise you can't parse it as xml

3) as I've already said you in another answer, the root.getChild("body") returns root's child which name is "body", without namespace. You should check the namespace for the element that you're looking for; if it has a qualified namespace you have to pass it in this way:

root.getChild("body", Namespace.getNamespace("your_namespace_uri"));

To know which namespace has your element in an easy way, you should print out all root's children using getChildren method:

for (Object element : doc.getRootElement().getChildren()) {
    System.out.println(element.toString());
}

If you're trying to parse an xhtml, probably you have namespace uri http://www.w3.org/1999/xhtml. So you should do this:

root.getChild("body", Namespace.getNamespace("http://www.w3.org/1999/xhtml"));