I am trying to store some text (e.g. č
) in a Postgres database, however when retrieving this value, it appears on screen as ?
. I'm not sure why it does this, I was under the impression that it was a character that wasn't supported in UTF-8, but was in UTF-8, however, judging by the first answer, this is an incorrect assumption.
Original question (which may still be valid):
I have read about UTF-8 Surrogate pairs, which may achieve what I require, and I've seen a few examples involving the
stringinfo
objectTextElementEnumerators
, but I couldn't work out a practical proof of concept.Can someone provide an example of how you would write and read UTF-16 (probably using this surrogate pair concept) to a postgres database. Thank you.
Updated question:
Why would the č
character be returned from the database as a question mark?
We use NPGSQL to access the database and VB.Net.
There's no such thing as a character which exists in UTF-16 but not UTF-8. Both are capable of encoding all of Unicode. In other words, if you can get UTF-8 to work, it should be able to store any valid Unicode text.
EDIT: Surrogate pairs are actually a feature of UTF-16 rather than UTF-8. They allow a character which isn't in the basic multi-lingual plane (BMP) to be represented as two UTF-16 code units. Basically, UTF-16 is often treated as a fixed-width encoding (exactly two bytes per Unicode character) but that only allows the BMP to be encoded cleanly. Surrogate pairs are a (fairly hacky) way of extending the range beyond the BMP.
I very much doubt that the character you're trying to represent is outside the BMP, so I suspect you need to look elsewhere for the problem. In particular, it's worth dumping the exact character values of the text (e.g. by casting each char
to int
) before it goes into the database and after you've fetched it. Ideally, do this in a short but complete console app.