Error: String contains an untranslatable character - TERADATA (for REGEXP_REPLACE operation)

Piyush Upadhyay picture Piyush Upadhyay · Jul 18, 2017 · Viewed 14.5k times · Source

I need to clean one character column and for that I am using REGEXP_REPLACE function in Teradata 14.

The same piece of code worked for some other data source (having the same LATIN encoding).

The data definition using show table has given me below format of the data:

CREATE SET TABLE pp_oap_cj_t.dc_loss_fdr_kn ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
( PARENT_ID DECIMAL(38,0),
 FS_MRCH_NM VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PARENT_ID );

The query I am performing is as below:

CREATE TABLE pp_oap_pb_piyush_t.CHECKMERCHANT1 AS (
 SELECT 
 FS_MRCH_NM,
REGEXP_REPLACE(trim(Upper(trim(REGEXP_REPLACE( (FS_MRCH_NM ) , '[^a-z]',' ',1,0,'i'))) ), '[[:space:]]+',' ',1,0,'i')   as cleaned_merchant
 FROM pp_oap_pb_piyush_t.CHECKMERCHANT)
  WITH DATA PRIMARY INDEX (FS_MRCH_NM);

Error

CREATE TABLE Failed. 6706:  The string contains an untranslatable character.

I need a quick turnaround this bottleneck.

Help is really appreciated ! Thanks !!!!

Answer

Kiran  picture Kiran · Jul 19, 2017

REGEXP_REPLACE under the hood converts character set Latin to Unicode. You have defined your variable as character set Latin. You see the error when data has something which cannot be converted from Latin to Unicode. Best thing is to fix your DDL to have character set as Unicode instead of Latin. something like TRANSLATE(FS_MRCH_NM USING LATIN_TO_UNICODE WITH ERROR) in your code instead of FS_MRCH_NM should work. Problem with this it result in null values when you have untranslatable characters.