I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag
.
Can i prevent it ?
this is my code
//from html to xhtml
try
{
fis = new FileInputStream(htmlFileName);
}
catch (java.io.FileNotFoundException e)
{
System.out.println("File not found: " + htmlFileName);
}
Tidy tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setXmlTags(false);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);//
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(fis, null);
try
{
tidy.pprint(xmlDoc,new FileOutputStream("c.xhtml"));
}
catch(Exception e)
{
}
I had only success, when the input is treated as XML as well. So either set xmltags to true
tidy.setXmlTags(true);
and live with the errors and warnings or do the conversion twice. First conversion to sanitize the html (html to xhtml) and a second conversion from xhtml to xhtml with set xmltags, thus no errors and warnings occur.
String htmlFileName = "test.html";
try( InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(htmlFileName);
FileOutputStream fos = new FileOutputStream("tmp.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, fos);
} catch (Exception e) {
e.printStackTrace();
}
try( InputStream in = new FileInputStream("tmp.xhtml");
FileOutputStream fos = new FileOutputStream("c.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setXmlTags(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, null);
tidy.pprint(xmlDoc, fos);
} catch (Exception e) {
e.printStackTrace();
}
I used the latest jtidy version 938.