java JSON text encoding issue

Hari Reddy picture Hari Reddy · Mar 20, 2013 · Viewed 14.3k times · Source

In my application I retrieve search results in JSON format from an external tool called Google Search Appliance(GSA).

The JSON result from GSA is very large and therefore I prefer to modify the GSA JSON result into something more suitable for displaying on my webpage.

If I directly display the GSA JSON result without formatting it in my java code I'm not facing any encoding issues on my webpage.

But if I format the large GSA JSON result into a suitable JSON format in my servlet java code I'm facing encoding problems. Example - “All Access Pass” gets displayed as ÂAll Access PassÂ.

I return the modified json from my servlet to the webpage use the following code -

response.setContentType("application/json;charset=UTF-8");

I have tried to change the charset to iso-8859-1 but it does not make any difference.

I edit my original JSON in the following manner -

        String responseText = getMethod.getResponseBodyAsString();

        JSONObject resultJSON = new JSONObject();
                try {

                    JSONObject jsonObj = new JSONObject(responseText);

                    JSONArray resultJsonArray = jsonObj
                            .getJSONArray("RES");

                    JSONObject searchResultJSON = null;

                    for (int iCnt = 0; iCnt < resultJsonArray.length(); iCnt++) {

                        searchResultJSON = new JSONObject();

                        JSONObject obj = resultJsonArray.getJSONObject(iCnt);
                        JSONObject metaTagObj = obj
                                .getJSONObject("MT");

                        if (metaTagObj.has(("title"))) {
                         searchResultJSON.put("title",metaTagObj.get("title").toString());
                        }
             resultJSON.accumulate("RES", searchResultJSON);
    }
   response.setContentType("application/json;charset=UTF-8"); 
   response.getWriter().print(resultJSON);

    }catch(JSONException e){}

The modification to the original JSON which I'm going here can be done in JavaScript which would solve my problem but it is something which I do not want to do.

  1. Is there a way to find out the encoding format of the text in the original GSA JSON?
  2. How can I avoid the java code from changing the text encoding in the original GSA JSON?

Please help me understand what is going on here and how I can avoid this problem.

Answer

Hari Reddy picture Hari Reddy · Apr 15, 2013

The text encoding problem was happening because the call which is made to the GSA server using Apache HTTP Client was using a default content encoding character set of iso-8859-1 but the GSA server expected the HTTP Client request and response to be in UTF-8 encoding.

This problem got resolved after setting the encoding for HTTPClient -

HttpClient httpClient = new HttpClient();
httpClient.getParams().setContentCharset("UTF-8");

And the servlet response encoding to

response.setContentType("application/json;charset=UTF-8");