Character looks like ASCII 63 but isn't so I can't remove it

Lou picture Lou · Sep 15, 2014 · Viewed 9.3k times · Source

I'm reading text from a text file. The first string the text file has to read is "Algood ", and note the space. In Notepad, it appears that there is a space in this string, but it isn't. When I test the 6th (zero-based index) character in Visual Studio's QuickWatch, it appears as:

"�"c

When I use the Asc function to get the ASCII code, it tells me that the ASCII code is 63. 63 is a question mark. But when I test to see if the string contains ASCII 63, it tests false. So it appears that the string contains the character with the ASCII code 63, only it doesn't, it contains some other character which tests as ASCII code 63. This is a problem: I can't remove the character if I don't know what to call it. I could remove the last character, but not every string in the text file contains this character.

enter image description here

The question is: what is this character if not a question mark, and how can I uniquely identify so I can remove it?

Answer

Hans Passant picture Hans Passant · Sep 15, 2014

It is the Unicode replacement character, U+FFFD, aka ChrW(&HFFFD).

Never use Asc() or Chr(), they are legacy VB6 functions that do not handle Unicode. Passing a fancy Unicode codepoint to Asc() always produces 63, the character code for "?"c, aka "I have no idea what you're saying". The exact same idea as"�"c but using an ASCII code instead.

Seeing the Black Diamond of Death back is always bad news, something went wrong when the string was converted from the underlying byte values. Because some byte values did not produce a valid character. Which is what you really should be looking for, you always want to avoid GIGO. Garbage In Garbage Out is an ugly data corruption problem that has no winners, only victims. You.