Invalid digits on Redshift

Maurício Borges picture Maurício Borges · Mar 19, 2018 · Viewed 30.5k times · Source

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.

I'm trying to run the following query:

SELECT
  CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
  public.some_table;

The some_field is a column that has data with two numbers joined by an underscore like this:

some_field -> 38972691802309_48937927428392

And I'm trying to get the second part.

That said, here is the error I'm getting:

[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0, 
Type: Long 
Details: 
 -----------------------------------------------
  error:  Invalid digit, Value '1', Pos 0, Type: Long 
  code:      1207
  context:   
  query:     1097254
  location:  :0
  process:   query0_99 [pid=0]
  -----------------------------------------------;

Execution time: 2.61s
Statement 1 of 1 finished

1 statement failed.

It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.

I thought it would be an encoding error, but I've not found any references to solve that. Anyone has any idea?

Thanks everybody.

Answer

szeitlin picture szeitlin · Oct 17, 2018

I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.

In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.

select cast(colname as integer) from
(select
 case when colname ~ '^[0-9]+$' then colname
 else null
 end as colname
 from tablename);

Bottom line: this Redshift error is completely confusing and really needs to be fixed.