I've observed this issue for years now, not knowing where it came from. I am concerned that this bug is still observable in the new versions of Android, in 2011, and I hope you can finally help me to fully understand it, if not solve it.
Let's consider the given (real) situation. Mister "A" is using a custom SMS/MMS app from Sony on his Xperia Arc (official 2.3.3). Mister B is using the android SMS/MMS stack app on his Milestone (Cyanogen 6.12, unofficial 2.2). Both of them use Android in French (if that matters).
When A sends a sms to B containing special characters like "ç", "ê", B receives a message with these characters replaced by a space. Characters like "é" are working fine though. When B sends the sms to A, everything works fine. When A sends this sms to himself, everything works fine.
Conclusion : this is not the mobile provider's fault since it works in one way and not the other.
So, I guessed at first that something was wrong with A's custom app. Replaced it with the apk from B's phone. Everything remained the same. I decompiled the app and I didn't find where the encoding of the sms string was done. I concluded the bug is not coming from the app, but from the way Android encodes the strings...
I ran another test : I wrote an sms with only standard characters, something like 250 characters in 1.5 sms. Then, I append a "ç" to the sms. On A's phone : the counter says it consumed 10 characters. On B's phone : the counter says the sms now takes 3 sms : the string size doubled !
Conclusion : On A's phone, the default charset includes "ç". On B's phone, when "ç" appears, the charset changes and each character needs then twice the original space. (Or am I missing something ?)
Questions : Why different version of Android aren't using the same default charset ? On Android, are these default charset depending on the rom, for example ? Can we configure/change these charset somewhere (in the menu or directly on a rooted phone) ? Is there another easy way to fix this ?
Any help, explanation or experience is welcome :)
You are suffering from encoding problems. From the description it looks like 'A' is sending data in one charset and not including information about what charset that is. The root cause is that to pass extended (non-ascii) characters between two systems they have to agree on an encoding to use. If you are restricted to 8 bit values then the systems agree to use the same codepages. In SMS there is a special GSM codepage for 7 or 8 bit encodings or UTF-16 can be used which uses 2 bytes to represent each character. What you see when you enter 250 characters followed by a single extended character shows you what is happening in the application. An SMS message is restricted to 140 octets. When you are using an 8 bit encoding your 250 chars fit into 2 messages (250 < 280) however once you added the "ç" the app changed to using UTF-16 encoding so suddenly all your characters are taking 2 octets and you can only fit 70 characters into a message. Now it takes 3.5 SMS messages to transfer the entire message.
On Android the decoding of the SMS message is part of the framework telephony code in SmsCbMessage.java. It works out the language code and encoding of the message body. If this is incorrect (the message was encoded with an english codepage but uses french extended chars) then you can get odd characters appearing.
You are right that this is not the mobile network at fault. I suspect it is phone A's messaging application although it is possible that Android is failing to correctly identify the encoding of a valid SMS. I wonder how it works between A and an iPhone or some other manufacturers device.