Convert zero-padded bytes to UTF-8 string

python unicode utf-8 byte strncpy

Matt Joiner · Feb 22, 2011 · Viewed 26.9k times · Source

I'm unpacking several structs that contain 's' type fields from C. The fields contain zero-padded UTF-8 strings handled by strncpy in the C code (note this function's vestigial behaviour). If I decode the bytes I get a unicode string with lots of NUL characters on the end.

>>> b'hiya\0\0\0'.decode('utf8')
'hiya\x00\x00\x00'

I was under the impression that trailing zero bytes were part of UTF-8 and would be dropped automatically.

What's the proper way to drop the zero bytes?

Answer

Use str.rstrip() to remove the trailing NULs:

>>> 'hiya\0\0\0'.rstrip('\0')
'hiya'

Convert zero-padded bytes to UTF-8 string

Answer

Related questions