Do certain characters take more bytes than others?

Tom picture Tom · Jun 26, 2009 · Viewed 14k times · Source

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.

I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.

The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?

Answer

Michael Borgwardt picture Michael Borgwardt · Jun 26, 2009

It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):

  • In ASCII or ISO 8859, each character is represented by one byte
  • In UTF-32, each character is represented by 4 bytes
  • In UTF-8, each character uses between 1 and 4 bytes
  • In ISO 2022, it's much more complicated

US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.