I am having this std::string which contains some characters that span multiple bytes.
When I do a substring on this string, the output is not valid, because ofcourse, these characters are counted as 2 characters. In my opinion I should be using a wstring instead, because it will store these characters in as one element instead of more.
So I decided to copy the string into a wstring, but ofcourse this does not make sense, because the characters remain split over 2 characters. This only makes it worse.
Is there a good solution on converting a string to a wstring, merging the special characters into 1 element instead of 2.
Thanks
Simpler version. based on the solution provided Getting the actual length of a UTF-8 encoded std::string? by Marcelo Cantos
std::string substr(std::string originalString, int maxLength)
{
std::string resultString = originalString;
int len = 0;
int byteCount = 0;
const char* aStr = originalString.c_str();
while(*aStr)
{
if( (*aStr & 0xc0) != 0x80 )
len += 1;
if(len>maxLength)
{
resultString = resultString.substr(0, byteCount);
break;
}
byteCount++;
aStr++;
}
return resultString;
}