We have a file in S3 that is loaded in to Redshift via the COPY
command. The import is failing because a VARCHAR(20)
value contains an Ä
which is being translated into ..
during the copy command and is now too long for the 20 characters.
I have verified that the data is correct in S3, but the COPY
command does not understand the UTF-8 characters during import. Has anyone found a solution for this?
the byte length for your varchar
column just needs to be larger.
Multi-byte characters (UTF-8) are supported in the varchar
data type, however the length that is provided is in bytes, NOT characters.
AWS documentation for Multibyte Character Load Errors states the following:
VARCHAR
columns accept multibyte UTF-8 characters, to a maximum of four bytes.
Therefore if you want the character Ä
to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.
AWS documentation for VARCHAR or CHARACTER VARYING states the following:
... so a
VARCHAR(120)
column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.
For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8
Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.