I need to clean one character column and for that I am using REGEXP_REPLACE function in Teradata 14.
The same piece of code worked for some other data source (having the same LATIN encoding).
The data definition using show table has given me below format of the data:
CREATE SET TABLE pp_oap_cj_t.dc_loss_fdr_kn ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
( PARENT_ID DECIMAL(38,0),
FS_MRCH_NM VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PARENT_ID );
The query I am performing is as below:
CREATE TABLE pp_oap_pb_piyush_t.CHECKMERCHANT1 AS (
SELECT
FS_MRCH_NM,
REGEXP_REPLACE(trim(Upper(trim(REGEXP_REPLACE( (FS_MRCH_NM ) , '[^a-z]',' ',1,0,'i'))) ), '[[:space:]]+',' ',1,0,'i') as cleaned_merchant
FROM pp_oap_pb_piyush_t.CHECKMERCHANT)
WITH DATA PRIMARY INDEX (FS_MRCH_NM);
Error
CREATE TABLE Failed. 6706: The string contains an untranslatable character.
I need a quick turnaround this bottleneck.
Help is really appreciated ! Thanks !!!!
REGEXP_REPLACE under the hood converts character set Latin to Unicode. You have defined your variable as character set Latin. You see the error when data has something which cannot be converted from Latin to Unicode. Best thing is to fix your DDL to have character set as Unicode instead of Latin. something like TRANSLATE(FS_MRCH_NM USING LATIN_TO_UNICODE WITH ERROR) in your code instead of FS_MRCH_NM should work. Problem with this it result in null values when you have untranslatable characters.