Given a Unicode string and these requirements:
For example, the iPhone push service requires JSON encoding with a maximum total packet size of 256 bytes.
What is the best way to truncate the string so that it re-encodes to valid Unicode and that it displays reasonably correctly?
(Human language comprehension is not necessary—the truncated version can look odd e.g. for an orphaned combining character or a Thai vowel, just as long as the software doesn't crash when handling the data.)
See Also:
def unicode_truncate(s, length, encoding='utf-8'):
encoded = s.encode(encoding)[:length]
return encoded.decode(encoding, 'ignore')
Here is an example for unicode string where each character is represented with 2 bytes in UTF-8:
>>> unicode_truncate(u'абвгд', 5)
u'\u0430\u0431'