I was writing files using
l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file, 'Rosëttenville');
but I changed this to
l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file, 'Rosëttenville');
when I found out that the extended ASCII (characters above code 127) were not written out correctly. However the second unicode version also does not write the extended characters correctly. Instead of Rosëttenville I'm getting Rosëttenville. Anyone know how to fix this?
You haven't said what your database character set is, and thus whether it's legitimate to have 'extended ascii' (probably 8859-1, with chr(235)
in this case) in a string, or if this is just a demo. Either way, I think, your problem is trying to implicitly convert a non-unicode string.
ë
is code point EB, which is also UTF-8 C3 AB
. You're getting separate characters Ã
(code point C3) and «
(code point AB). So it can't do a direct translation from chr(235)
, which is 0x00EB
, to U+00EB
. It seems to be going via the UTF-8 C3 AB
as two separate characters. I'm not going to try to understand exactly why...
You can either use the convert
function:
l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
convert('Rosëttenville', 'WE8ISO8859P1', 'UTF8'));
... or, as use of that is discourage by Oracle, the utl_raw.convert
function:
l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));
Both give me the value you want, and your original gave me the same value you saw (where my DB character set is AL32UTF8
in 11gR2 on Linux). If your DB character set is not Unicode, your national character set certainly appears to be (it isn't clear in the question if you got the same output with both attempts), so the nchar
version should work instead:
l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file,
utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));
It would probably be better to be working with Unicode values in the first place, particularly if you currently have a mix of 'extended ascii' and other string types in a table; applying the conversion to everything in that case might give some odd results...