Android ICS 4.0 NDK NewStringUTF is crashing down the App

rana picture rana · Aug 26, 2012 · Viewed 26.3k times · Source

I have a method in JNI C/C++ which takes jstring and returns back jstring some thing like as below,

  NATIVE_CALL(jstring, method)(JNIEnv * env, jobject obj, jstring filename)
  {

// Get jstring into C string format.
  const char* cs = env->GetStringUTFChars (filename, NULL);
  char *file_path = new char [strlen (cs) + 1]; // +1 for null terminator
  sprintf (file_path, "%s", cs);
  env->ReleaseStringUTFChars (filename, cs);


  reason_code = INTERNAL_FAILURE;
  char* info = start_module(file_path);  


  jstring jinfo ;


  if(info==NULL)
  {
      jinfo = env->NewStringUTF(NULL);
  }
  else
  {
      jinfo = env->NewStringUTF(info);

  }


  delete info;

  info = NULL;
  return jinfo;
  }

The code works perfectly with prior android 4.0 versions like 2.2,2.3 and so on. With ICS 4.0 check JNI is on by default and because of it the app crashes throwing the following error

 08-25 22:16:35.480: W/dalvikvm(24027): **JNI WARNING: input is not valid Modified UTF-8: illegal  continuation byte 0x40**
08-25 22:16:35.480: W/dalvikvm(24027):              
08-25 22:16:35.480: W/dalvikvm(24027): ==========
08-25 22:16:35.480: W/dalvikvm(24027): /tmp/create
08-25 22:16:35.480: W/dalvikvm(24027): ==========
08-25 22:16:35.480: W/dalvikvm(24027): databytes,indoorgames,drop
08-25 22:16:35.480: W/dalvikvm(24027): ==========���c_ag����ϋ@�ډ@�����@'
 08-25 22:16:35.480: W/dalvikvm(24027):              in Lincom/inter       /ndk/comNDK;.rootNDK:(Ljava/lang/String;)Ljava/lang/String; **(NewStringUTF)**
08-25 22:16:35.480: I/dalvikvm(24027): "main" prio=5 tid=1 NATIVE
08-25 22:16:35.480: I/dalvikvm(24027):   | group="main" sCount=0 dsCount=0 obj=0x40a4b460   self=0x1be1850
08-25 22:16:35.480: I/dalvikvm(24027):   | sysTid=24027 nice=0 sched=0/0 cgrp=default handle=1074255080
08-25 22:16:35.490: I/dalvikvm(24027):   | schedstat=( 49658000 26700000 48 ) utm=1 stm=3 core=1
08-25 22:16:35.490: I/dalvikvm(24027):   at comrootNDK(Native Method)

I am clueless as to where i am wrong. If you see above NewStringUTF is adding some garbage value to the c Char* bytes .

  1. Any idea about why this is happening
  2. Any alternative solution to achieve the above is welcome

I really appreciate if one of you can help me in . Thanks in advance

regds me

Answer

Moshe Rubin picture Moshe Rubin · Oct 23, 2014

The cause of this problem is directly related to a known UTF-8 bug in the NDK/JNI GetStringUTFChars() function (and probably related functions like NewStringUTF). These NDK functions do not convert supplementary Unicode characters (i.e., Unicode characters with a value of U+10000 and above) correctly. This leads to incorrect UTF-8 and subsequent crashes.

I encountered the crash when handling user input text that contained emoticon characters (see the corresponding Unicode chart). Emoticon characters lie in the Supplementary Unicode character range.

Analysis of the Problem

  1. The Java client passes a string containing a supplementary Unicode character to JNI/NDK.
  2. JNI uses the NDK function GetStringUTFChars() to extract the contents of the Java string.
  3. GetStringUTFChars() returns the string data as incorrect and invalid UTF-8.

There is a known NDK bug whereby GetStringUTFChars() incorrectly converts supplementary Unicode characters, producing an incorrect and invalid UTF-8 sequence.

In my case, the resulting string was a JSON buffer. When the buffer was passed to the JSON parser, the parser promptly failed because one of the UTF-8 characters of the extracted UTF-8 had an invalid UTF-8 prefix byte.

Possible Workaround

The solution I've used can be summarized as follows:

  1. The goal is to prevent GetStringUTFChars() from performing the incorrect UTF-8 encoding of the supplementary Unicode character.
  2. This is done by the Java client encoding the request string as Base64.
  3. The Base64-encoded request is passed to JNI.
  4. JNI calls GetStringUTFChars(), which extracts the Base64-encoded string without performing any UTF-8 encoding.
  5. The JNI code then decodes the Base-64 data, producing the original UTF-16 (wide char) request string, including the supplementary Unicode character.

In this way we circumvent the problem of extracting supplementary Unicode characters from the Java string. Instead, we convert the data to Base-64 ASCII before calling GetStringUTFChars(), extract the Base-64 ASCII characters using GetStringUTFChars(), and convert the Base-64 data back to wide characters.