While trying to parse some unicode text strings, I'm hitting an invisible character that I can't find any definition for. If I paste it in to a text editor and show invisibles, I can see that it looks like a bullet point (• alt-8), and by copy/pasting them, I can see it has an effect like a space or tab, but it's none of those.
I need to test for it, something like...
if(uniChar == L'\t')
But of course I need to provide something to match to.
It has bytes 0xc2 0xa0 in UTF-8.
If no-one has a definition, is there any devious way to test for something I can't define!?
(I happen to be using NSStrings in Objective-C, OSX, Xcode, but I don't think that has any bearing.)
Bytes C2 A0 in UTF-8 encode U+00A0 ɴᴏ-ʙʀᴇᴀᴋ sᴘᴀᴄᴇ, which can be used, for example, to display combining marks in isolation. It is
as a named HTML entity. It is almost the same as a U+0020 sᴘᴀᴄᴇ, except it prevents line breaks before or after it, and acts as a numerical separator for bidirectional layout.
The dot you see when you ask a text editor to show invisibles just happens to be what glyph the text editor chose to display spaces. It does not mean the character in question is U+00B7 ᴍɪᴅᴅʟᴇ ᴅᴏᴛ, which is definitely not invisible.
In code, if you have it as a unichar
, you can compare it to L'\x00A0'
.