Convert zero-padded bytes to UTF-8 string

Matt Joiner picture Matt Joiner · Feb 22, 2011 · Viewed 26.9k times · Source

I'm unpacking several structs that contain 's' type fields from C. The fields contain zero-padded UTF-8 strings handled by strncpy in the C code (note this function's vestigial behaviour). If I decode the bytes I get a unicode string with lots of NUL characters on the end.

>>> b'hiya\0\0\0'.decode('utf8')
'hiya\x00\x00\x00'

I was under the impression that trailing zero bytes were part of UTF-8 and would be dropped automatically.

What's the proper way to drop the zero bytes?

Answer

Adam Rosenfield picture Adam Rosenfield · Feb 22, 2011

Use str.rstrip() to remove the trailing NULs:

>>> 'hiya\0\0\0'.rstrip('\0')
'hiya'