How to encode a STRING variable into a given code page

Sandra Rossi picture Sandra Rossi · Feb 9, 2020 · Viewed 9.5k times · Source

I've got a string variable containing a text that I need to encode and write to a file, in UTF-16LE code page.

Currently the following code generates a UTF-8 file and I don't see any option in the statement OPEN DATASET to generate the file in UTF-16LE.

REPORT zmyprogram.

DATA(filename) = `/tmp/myfile`.

OPEN DATASET filename IN TEXT MODE ENCODING DEFAULT FOR OUTPUT.

TRANSFER 'HELLO WORLD' TO filename.

CLOSE DATASET filename.

I guess one solution is to first encode the string in memory, then write the encoded bytes to the file.

Generally speaking, how to encode a string of characters into a given code page, in memory?

Answer

Sandra Rossi picture Sandra Rossi · Feb 9, 2020

In the first part, I explain how to encode a string of characters into a given code page (all is done in memory), and in the second part, I explain specifically how to write files to the application server in a given code page.

  1. General way (all in memory)

If a string of characters (type STRING) has to be encoded, the result has to be stored in a string of bytes, which corresponds to the built-in data type XSTRING.

There are several possibilities which depend on the ABAP version; I use :

  • Since 7.53, use the class CL_ABAP_CONV_CODEPAGE:

    DATA(xstring) = cl_abap_conv_codepage=>create_out( codepage = `UTF-16LE` )->convert( source = `ABCDE` ).

  • Since 7.02, use the class CL_ABAP_CODEPAGE:

    DATA xstring TYPE xstring.

    xstring = cl_abap_codepage=>convert_to( source = `ABCDE` codepage = `UTF-16LE` ).

  • Before 7.02, use the class CL_ABAP_CONV_OUT_CE (documentation provided with the class):

    First, instantiate the conversion object, use a SAP code page number instead of the ISO name (list of values shown hereafter):

    DATA: conv TYPE REF TO CL_ABAP_CONV_OUT_CE, xstring TYPE xstring.

    conv = CL_ABAP_CONV_OUT_CE=>CREATE( encoding = '4103' ). "4103 = utf-16le

    Then encode the string and retrieve the bytes encoded:

    conv->RESET( ).

    conv->WRITE( data = `ABCDE` ).

    xstring = conv->GET_BUFFER( ).

    Eventually, instead of using RESET, WRITE and GET_BUFFER, the method CONVERT was added in 6.40 and retroported :

    conv->CONVERT( EXPORTING data = `ABCDE` IMPORTING buffer = xstring ).

With the class CL_ABAP_CONV_OUT_CE, you need to use the number of the SAP Code Page, not the ISO name. Here are the most common SAP code pages and their equivalent ISO names:

  • 1100: ISO-8859-1
  • 1101: US-ASCII
  • 1160: Windows-1252 ("ANSI")
  • 1401: ISO-8859-2
  • 4102: UTF-16BE
  • 4103: UTF-16LE
  • 4104: UTF-32BE
  • 4105: UTF-32LE
  • 4110: UTF-8
  • Etc. (the possible values are defined in the table TCP00A, in lines with column CPATTRKIND = 'H').

 

  1. Writing a file on the application server in a given code page

In ABAP, OPEN DATASET can directly specify the target code page, most code pages are supported including UTF-8, but not other UTF (code pages 41xx) which can be done only by the solution explained in 2.3 below (by first encoding in memory).

  • 2.1) IN TEXT MODE ENCODING ...

Possible ENCODING values:

  • UTF-8: in this mode, it's possible to add the Byte Order Mark if needed, via the option WITH BYTE-ORDER MARK.
  • DEFAULT: will be UTF-8 in a SAP "Unicode" system (that you can check via the menu System > Status > Unicode System Yes/No), NON-UNICODE otherwise.
  • NON-UNICODE: will depend on the current ABAP linguistic environment; for language English, it's the character encoding iso-8859-1, for language Polish, it's the character encoding iso-8859-2, etc. (the equivalences are shown in table TCP0C.)

Example in ABAP version 7.52 to write to UTF-8 with the byte order mark:

REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_utf_8`.
OPEN DATASET filename IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK FOR OUTPUT.
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.

Example in ABAP version 7.52 to write to iso-8859-2 (Polish language here):

REPORT zmyprogram.
SET LOCALE LANGUAGE 'L'. " Polish
DATA(filename) = `/tmp/dataset_nonunicode_pl`.
OPEN DATASET filename IN TEXT MODE ENCODING NON-UNICODE FOR OUTPUT.
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
  • 2.2) IN LEGACY TEXT MODE CODE PAGE ...

Use any code page number except code pages 41xx (i.e. UTF-8 and other UTF; see workaround in 2.3 below).

Example in ABAP version 7.52 to write to iso-8859-2 (code page 1401) :

REPORT zmyprogram.
DATA(filename) = `/tmp/dataset_iso_8859_2`.

OPEN DATASET filename IN LEGACY TEXT MODE CODE PAGE '1401' FOR OUTPUT. " iso-8859-2
TRY.
    TRANSFER `Witaj świecie` TO filename.
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
ENDTRY.
CLOSE DATASET filename.
  • 2.3) UTF = general way + IN BINARY MODE

Example in ABAP version 7.52:

REPORT zmyprogram.
TRY.
    DATA(xstring) = cl_abap_codepage=>convert_to( source = `Witaj świecie` codepage = `UTF-16LE` ).
  CATCH cx_sy_conversion_codepage INTO DATA(lx).
    " Character not supported in language code page
    BREAK-POINT.
ENDTRY.
DATA(filename) = `/tmp/dataset_utf_16le`.
OPEN DATASET filename IN BINARY MODE FOR OUTPUT.
TRANSFER xstring TO filename.
CLOSE DATASET filename.