How to output extended ascii characters using Oracle utl_file

Superdooperhero picture Superdooperhero · Jun 11, 2013 · Viewed 12k times · Source

I was writing files using

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file, 'Rosëttenville');

but I changed this to

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file, 'Rosëttenville');

when I found out that the extended ASCII (characters above code 127) were not written out correctly. However the second unicode version also does not write the extended characters correctly. Instead of Rosëttenville I'm getting Rosëttenville. Anyone know how to fix this?

Answer

Alex Poole picture Alex Poole · Jun 11, 2013

You haven't said what your database character set is, and thus whether it's legitimate to have 'extended ascii' (probably 8859-1, with chr(235) in this case) in a string, or if this is just a demo. Either way, I think, your problem is trying to implicitly convert a non-unicode string.

ë is code point EB, which is also UTF-8 C3 AB. You're getting separate characters à (code point C3) and « (code point AB). So it can't do a direct translation from chr(235), which is 0x00EB, to U+00EB. It seems to be going via the UTF-8 C3 AB as two separate characters. I'm not going to try to understand exactly why...

You can either use the convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  convert('Rosëttenville', 'WE8ISO8859P1', 'UTF8'));

... or, as use of that is discourage by Oracle, the utl_raw.convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

Both give me the value you want, and your original gave me the same value you saw (where my DB character set is AL32UTF8 in 11gR2 on Linux). If your DB character set is not Unicode, your national character set certainly appears to be (it isn't clear in the question if you got the same output with both attempts), so the nchar version should work instead:

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

It would probably be better to be working with Unicode values in the first place, particularly if you currently have a mix of 'extended ascii' and other string types in a table; applying the conversion to everything in that case might give some odd results...