I'm unpacking several structs that contain 's'
type fields from C. The fields contain zero-padded UTF-8 strings handled by strncpy
in the C code (note this function's vestigial behaviour). If I decode the bytes I get a unicode string with lots of NUL
characters on the end.
>>> b'hiya\0\0\0'.decode('utf8')
'hiya\x00\x00\x00'
I was under the impression that trailing zero bytes were part of UTF-8 and would be dropped automatically.
What's the proper way to drop the zero bytes?
Use str.rstrip()
to remove the trailing NULs:
>>> 'hiya\0\0\0'.rstrip('\0')
'hiya'