How to save Hindi Characters in the Application Properties file in Java?

ashishjmeshram picture ashishjmeshram · Sep 6, 2011 · Viewed 8.1k times · Source

We are trying to internationalize our Spring MVC web application in Hindi language. When we try to copy the Hindi text in the properties file, the properties file shows small boxes in places of Hindi characters.

When we run the application and see the JSP, it shows questions marks (???????) in place of Hindi characters.

Edit:

My properties file has following contents.

login.message=\u0915\u0943\u092a\u092f\u093e \u0905\u092a\u0928\u0947 \u0916\u093e\u0924\u0947 \u092e\u0947\u0902 \u0932\u0949\u0917 \u0911\u0928 \u0915\u0930\u0928\u0947 \u0915\u0947 \u0932\u093f\u090f \u0928\u0940\u091a\u0947 \u0915\u0947 \u092b\u093e\u0930\u094d\u092e \u0915\u093e \u0909\u092a\u092f\u094b\u0917 \u0915\u0930\u0947\u0902

I have used following command to get this encoded string.

native2ascii -encoding utf-8 ApplicationResources_hi.properties gen\ApplicationResources_hi.properties

My JSP page has following line in head section

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Do I need to do anything else? Sorry I may be missing something here.

Answer

Joachim Sauer picture Joachim Sauer · Sep 6, 2011

You actually have to get 2 separate steps right for this to work:

  1. getting the text from your .properties file to your Java code correctly and
  2. getting the text from your Java code to the browser in a way that it understands.

The first one is somewhat strange: .properties files are defined to use the ISO-8859-1 encoding, which doesn't support arbitrary Unicode characters, but luckily .properties file support the same Unicode escapes that Java source code also supports, namely \uxxxx.

Now writing those escapes by hand can become nasty, so there are basically two alternatives:

  • write a .properties file with the encoding of your choice (UTF-8 probably) and use native2ascii to convert it to the propper encoding
  • use a dedicated .properties editor that already does this (and usually a bit more) behind the scenes

Once this works (verify it in the debugger by looking at the character values (String.charAt() of some localized strings), you need to make sure that the browser actually receives the data in the correct way.

The easiest way here is to make sure that you use UTF-8 encoding to get the data to the browser, since UTF-8 can represent all possible Unicode codepoints.

If you're using JSP to produce your output, then you can use something like this to specify that you want UTF-8 output:

<%@ page contentType="text/html; charset=utf-8" language="java" %>

See the Java EE Tutorial for details.