JSP and HTML parser for JAVA

Karl Cheng picture Karl Cheng · Oct 30, 2012 · Viewed 7.7k times · Source

I have been using Jsoup for parsing my HTML files and so far it does a great job. However, it's not able to parse any server tags ( <% ... %> ). I decided to extend it but I cannot find an easy way to extend its Parser and all those private/package level classes (i.e. TreeBuilder, TransitionState ... etc)...

So I started looking at Jericho as it claims it can parse server tags - however, its documentation is so poor that I cannot even get started easily. And seems like its API is not as friendly as what Jsoup provides - it's not that straight forward to extract some nodes and move it around ...

I wonder if anyone has the similar situation before and how you get it solved? In short, I just want to parse JSP files in Java. (Well .. please don't ask me to implement one by myself ;p )

Answer

Karl Cheng picture Karl Cheng · Nov 6, 2012

Lastly I get a workaround: put server code block in a HTML comment block so that 1) server code can get executed correctly; 2) Jsoup can process the whole block as a HTML comment node without touching anything inside.

e.g.

<!--
<%@ page language="java" errorPage="/error.jsp" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %>
<%@ page import="com.systemcrossed.groupbuystart.webapp.display.DisplayHelper" %>
<%@ page import="com.systemcrossed.groupbuystart.webapp.util.JsonUtil" %>
<%@ page import="org.apache.commons.lang.StringEscapeUtils" %>
<%@ include file="/_sys/pages/public/incl/jspCommon.jsp" %>
-->
<!--<%
    // Java code here
%>-->
<html>
<head>
    ... html stuff

It works well for me now! Hope ppl who got the same problem could get some help ! ;)