Unicode input retrieved via PrimeFaces input components become corrupted

Question 1

Unicode input retrieved via PrimeFaces input components become corrupted

jsf unicode primefaces character-encoding mojibake

Mr.J4mes · Mar 9, 2012 · Viewed 13.7k times · Source

Answer

Answer

Introduction

Normally, JSF/Facelets will set the request parameter character encoding to UTF-8 by default already when the view is created/restored. But if any request parameter is been requested before the view is been created/restored, then it's too late to set the proper character encoding. The request parameters will namely be parsed only once.

PrimeFaces encoding fail

That it failed in PrimeFaces 3.x after upgrading from 2.x is caused by the new isAjaxRequest() override in PrimeFaces' PrimePartialViewContext which checks a request parameter:

@Override
public boolean isAjaxRequest() {
    return getWrapped().isAjaxRequest()
            || FacesContext.getCurrentInstance().getExternalContext().getRequestParameterMap().containsKey("javax.faces.partial.ajax");
}

By default, the isAjaxRequest() (the one of Mojarra/MyFaces, as the above PrimeFaces code has obtained by getWrapped()) checks the request header as follows which does not affect the request parameter encoding as request parameters won't be parsed when a request header is obtained:

    if (ajaxRequest == null) {
        ajaxRequest = "partial/ajax".equals(ctx.
            getExternalContext().getRequestHeaderMap().get("Faces-Request"));
    }

However, the isAjaxRequest() may be called by any phase listener or system event listener or some application factory before the view is been created/restored. So, when you're using PrimeFaces 3.x, then the request parameters will be parsed before the proper character encoding is been set and hence use the server's default encoding which is usually ISO-8859-1. This will mess up everything.

Solutions

There are several ways to fix it:

Use a servlet filter which sets ServletRequest#setCharacterEncoding() with UTF-8. Setting the response encoding by ServletResponse#setCharacterEncoding() is by the way unnecessary as it won't be affected by this issue.
```
@WebFilter("/*")
public class CharacterEncodingFilter implements Filter {

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws ServletException, IOException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    // ...
}
```
You only need to take into account that HttpServletRequest#setCharacterEncoding() only sets the encoding for POST request parameters, not for GET request parameters. For GET request parameters you'd still need to configure it at server level.

If you happen to use JSF utility library OmniFaces, such a filter is already provided out the box, the CharacterEncodingFilter. Just install it as below in web.xml as first filter entry:
```
<filter>
    <filter-name>characterEncodingFilter</filter-name>
    <filter-class>org.omnifaces.filter.CharacterEncodingFilter</filter-class>
</filter>
<filter-mapping>
    <filter-name>characterEncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>
```
Reconfigure the server to use UTF-8 instead of ISO-8859-1 as default encoding. In case of Glassfish, that would be a matter of adding the following entry to <glassfish-web-app> of the /WEB-INF/glassfish-web.xml file:
```
<parameter-encoding default-charset="UTF-8" />
```
Tomcat doesn't support it. It has the URIEncoding attribute in <Context> entry, but this applies to GET requests only, not to POST requests.
Report it as a bug to PrimeFaces. Is there really any legitimate reason to check the HTTP request being an ajax request by checking a request parameter instead of a request header like as you would do for standard JSF and for example jQuery? The PrimeFaces' core.js JavaScript is doing that. It would be better if it has set it as a request header of XMLHttpRequest.

Solutions which do NOT work

Perhaps you'll stumble upon below "solutions" somewhere on the Internet while investigating this problem. Those solutions do won't ever work in this specific case. Explanation follows.

Setting XML prolog:
```
<?xml version='1.0' encoding='UTF-8' ?>
```
This only tells the XML parser to use UTF-8 to decode the XML source before building the XML tree around it. The XML parser actually being used by Facelts is SAX during JSF view build time. This part has completely nothing to do with HTTP request/response encoding.
Setting HTML meta tag:
```
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
```
The HTML meta tag is ignored when the page is served over HTTP via a http(s):// URI. It's only been used when the page is by the client saved as a HTML file on local disk system and then reopened by a file:// URI in browser.
Setting HTML form accept charset attribute:
```
<h:form accept-charset="UTF-8">
```
Modern browsers ignore this. This has only effect in Microsoft Internet Explorer browser. Even then it is doing it wrongly. Never use it. All real webbrowsers will instead use the charset attribute specified in the Content-Type header of the response. Even MSIE will do it the right way as long as you do not specify the accept-charset attribute.
Setting JVM argument:
```
-Dfile.encoding=UTF-8
```
This is only used by the Oracle(!) JVM to read and parse the Java source files.

Question 2

When I was still using PrimeFaces v2.2.1, I was able to type unicode input such as Chinese with a PrimeFaces input component such as <p:inputText> and <p:editor>, and retrieve the input in good shape in managed bean method.

However, after I upgraded to PrimeFaces v3.1.1, all those characters become Mojibake or question marks. Only Latin input comes fine, it are the Chinese, Arabic, Hebrew, Cyrillic, etc characters which become malformed.

How is this caused and how can I solve it?

Unicode input retrieved via PrimeFaces input components become corrupted

Answer

Introduction

PrimeFaces encoding fail

Solutions

Solutions which do NOT work

Related questions