How to encode a STRING variable into a given code page

Question 1

How to encode a STRING variable into a given code page

character-encoding abap codepages character-set

Sandra Rossi · Feb 9, 2020 · Viewed 9.5k times · Source

Answer

Answer

In the first part, I explain how to encode a string of characters into a given code page (all is done in memory), and in the second part, I explain specifically how to write files to the application server in a given code page.

General way (all in memory)

If a string of characters (type STRING) has to be encoded, the result has to be stored in a string of bytes, which corresponds to the built-in data type XSTRING.

There are several possibilities which depend on the ABAP version; I use :

Since 7.53, use the class CL_ABAP_CONV_CODEPAGE:

DATA(xstring) = cl_abap_conv_codepage=>create_out( codepage = `UTF-16LE` )->convert( source = `ABCDE` ).
Since 7.02, use the class CL_ABAP_CODEPAGE:

DATA xstring TYPE xstring.

xstring = cl_abap_codepage=>convert_to( source = `ABCDE` codepage = `UTF-16LE` ).
Before 7.02, use the class CL_ABAP_CONV_OUT_CE (documentation provided with the class):

First, instantiate the conversion object, use a SAP code page number instead of the ISO name (list of values shown hereafter):

DATA: conv TYPE REF TO CL_ABAP_CONV_OUT_CE, xstring TYPE xstring.

conv = CL_ABAP_CONV_OUT_CE=>CREATE( encoding = '4103' ). "4103 = utf-16le

Then encode the string and retrieve the bytes encoded:

conv->RESET( ).

conv->WRITE( data = `ABCDE` ).

xstring = conv->GET_BUFFER( ).

Eventually, instead of using RESET, WRITE and GET_BUFFER, the method CONVERT was added in 6.40 and retroported :

conv->CONVERT( EXPORTING data = `ABCDE` IMPORTING buffer = xstring ).

With the class CL_ABAP_CONV_OUT_CE, you need to use the number of the SAP Code Page, not the ISO name. Here are the most common SAP code pages and their equivalent ISO names:

1100: ISO-8859-1
1101: US-ASCII
1160: Windows-1252 ("ANSI")
1401: ISO-8859-2
4102: UTF-16BE
4103: UTF-16LE
4104: UTF-32BE
4105: UTF-32LE
4110: UTF-8
Etc. (the possible values are defined in the table TCP00A, in lines with column CPATTRKIND = 'H').

Writing a file on the application server in a given code page

In ABAP, OPEN DATASET can directly specify the target code page, most code pages are supported including UTF-8, but not other UTF (code pages 41xx) which can be done only by the solution explained in 2.3 below (by first encoding in memory).

2.1) IN TEXT MODE ENCODING ...

Possible ENCODING values:

UTF-8: in this mode, it's possible to add the Byte Order Mark if needed, via the option WITH BYTE-ORDER MARK.
DEFAULT: will be UTF-8 in a SAP "Unicode" system (that you can check via the menu System > Status > Unicode System Yes/No), NON-UNICODE otherwise.
NON-UNICODE: will depend on the current ABAP linguistic environment; for language English, it's the character encoding iso-8859-1, for language Polish, it's the character encoding iso-8859-2, etc. (the equivalences are shown in table TCP0C.)

Example in ABAP version 7.52 to write to UTF-8 with the byte order mark:

REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_utf_8`.
OPEN DATASET filename IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK FOR OUTPUT.
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.

Example in ABAP version 7.52 to write to iso-8859-2 (Polish language here):

REPORT zmyprogram.
SET LOCALE LANGUAGE 'L'. " Polish
DATA(filename) = `/tmp/dataset_nonunicode_pl`.
OPEN DATASET filename IN TEXT MODE ENCODING NON-UNICODE FOR OUTPUT.
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.

2.2) IN LEGACY TEXT MODE CODE PAGE ...

Use any code page number except code pages 41xx (i.e. UTF-8 and other UTF; see workaround in 2.3 below).

Example in ABAP version 7.52 to write to iso-8859-2 (code page 1401) :

REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_iso_8859_2`.

OPEN DATASET filename IN LEGACY TEXT MODE CODE PAGE '1401' FOR OUTPUT. " iso-8859-2
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.

2.3) UTF = general way + IN BINARY MODE

Example in ABAP version 7.52:

REPORT zmyprogram.
TRY.
    DATA(xstring) = cl_abap_codepage=>convert_to( source = `Witaj świecie` codepage = `UTF-16LE` ).
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
    BREAK-POINT.
ENDTRY.
DATA(filename) = `/tmp/dataset_utf_16le`.
OPEN DATASET filename IN BINARY MODE FOR OUTPUT.
TRANSFER xstring TO filename.
CLOSE DATASET filename.

Question 2

I've got a string variable containing a text that I need to encode and write to a file, in UTF-16LE code page.

Currently the following code generates a UTF-8 file and I don't see any option in the statement OPEN DATASET to generate the file in UTF-16LE.

REPORT zmyprogram.

DATA(filename) = `/tmp/myfile`.

OPEN DATASET filename IN TEXT MODE ENCODING DEFAULT FOR OUTPUT.

TRANSFER 'HELLO WORLD' TO filename.

CLOSE DATASET filename.

I guess one solution is to first encode the string in memory, then write the encoded bytes to the file.

Generally speaking, how to encode a string of characters into a given code page, in memory?

How to encode a STRING variable into a given code page

Answer

Related questions