Length of a C++ std::string in bytes

ComicSansMS picture ComicSansMS · Oct 12, 2011 · Viewed 24.4k times · Source

I'm having some trouble figuring out the exact semantics of std::string.length(). The documentation explicitly points out that length() returns the number of characters in the string and not the number of bytes. I was wondering in which cases this actually makes a difference.

In particular, is this only relevant to non-char instantiations of std::basic_string<> or can I also get into trouble when storing UTF-8 strings with multi-byte characters? Does the standard allow for length() to be UTF8-aware?

Answer

Lightness Races in Orbit picture Lightness Races in Orbit · Oct 12, 2011

When dealing with non-char instantiations of std::basic_string<>, sure, length may not equal number of bytes. This is particularly evident with std::wstring:

std::wstring ws = L"hi";
cout << ws.length();     // <-- 2, not 4

But std::string is about char characters; there is no such thing as a multi-byte character as far as std::string is concerned, whether you crammed one in at a high level or not. So, std::string.length() is always the number of bytes represented by the string. Note that if you're cramming multibyte "characters" into an std::string, then your definition of "character" suddenly becomes at odds with that of the container and of the standard.