Unicode vs Multi-byte

c unicode visual-c++ multibyte

Rayne · Feb 9, 2010 · Viewed 12.2k times · Source

I'm really confused by this unicode vs multi-byte thing.

Say I'm compiling my program in Unicode (but ultimately, I want a solution that is independent of the character set used).

1) Will all 'char' be interpreted as wide characters?

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

Thank you.

Regards, Rayne

Answer

First, if you're compiling with UNICODE/_UNICODE and don't intend to target other platforms, you can avoid using the TCHAR business and use WCHAR (or wchar_t) and W functions everywhere.

1) Will all 'char' be interpreted as wide characters?

char in C is--by definition--1 byte. (This doesn't technically preclude it from being a "wide character" on platforms where wchar_t is also 1 byte, but given that you're using MSVC and are targeting Windows platforms, that's not going to be the case.)

So for practical purposes, the answer to this is: no.

2) If I have a simple printf statement, i.e. printf("Hello World\n"); with no character strings, can I just leave it be without using _tprintf and _T("...")? If the printf statement includes a character string, then I should use _tprintf and _T("..."), i.e. _tprintf("Hello %s\n", name); ?

If you're printing ASCII string literals, you can continue using printf.

If you're printing arbitrary strings that could lie outside of the ASCII range, you should use _tprintf (or wprintf).

3) If I have a text file (saved in the default format, i.e. without changing the default character set used) that I want to read into a buffer, can I still use char instead of TCHAR? Especially if I'm reading it character by character, i.e. by incrementing the character pointer?

What is "the default format"?

When you're reading in an external file, you should read in the first few bytes first to check for a UTF-16 or UTF-8 BOM, and then base your decisions around that.

Unicode vs Multi-byte

Answer

Related questions