After a lot of trial and error I still can't figure out the problem. The JSP, servlet, and database are all set to accept UTF-8 encoding, but even still whenever I use request.getParameter on anything that has any two-byte characters like the em dash they get scrambled up as broken characters.
I've made manual submissions to the database and it's able to accept these characters, no problem. And if I pull the text from the database in a servlet and print it in my jsp page's form it displays no problem.
The only time I've found that it comes back as broken characters is when I try and display it elsewhere after retrieving it using request.getParameter.
Has anyone else had this problem? How can I fix it?
That can happen if request and/or response encoding isn't properly set at all.
For GET requests, you need to configure it at the servletcontainer level. It's unclear which one you're using, but for in example Tomcat that's to be done by URIEncoding
attribute in <Connector>
element in its /conf/server.xml
.
<Connector ... URIEncoding="UTF-8">
For POST requests, you need to create a filter which is mapped on the desired URL pattern covering all those POST requests. E.g. *.jsp
or even /*
. Do the following job in doFilter()
:
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
For HTML responses and client side encoding of submitted HTML form input values, you need to set the JSP page encoding. Add this to top of the JSP (you've probably already done it properly given the fact that displaying UTF-8 straight form DB works fine).
<%@page pageEncoding="UTF-8" %>
Or to prevent copypasting this over every single JSP, configure it once in web.xml
:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
For source code files and stdout (IDE console), you need to set the IDE workspace encoding. It's unclear which one you're using, but for in example Eclipse that's to be done by setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.
Do note that HTML <meta http-equiv>
tags are ignored when page is served over HTTP. It's only considered when page is opened from local disk file system via file://
. Also specifying <form accept-charset>
is unnecessary as it already defaults to response encoding used during serving the HTML page with the form. See also W3 HTML specification.