Find non-ASCII characters in varchar columns using SQL Server

Gerhard Weiss picture Gerhard Weiss · Oct 8, 2010 · Viewed 127k times · Source

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.

I am doing something like this now, but it is not working

select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'

For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:

  • The identity field for that record. (This will allow the whole record to be reviewed with another query.)
  • The column name
  • The text with the invalid character
 Id | FieldName | InvalidText       |
----+-----------+-------------------+
 25 | LastName  | Solís             |
 56 | FirstName | François          |
100 | Address1  | 123 Ümlaut street |

Invalid characters would be any outside the range of SPACE (3210) through ~ (12710)

Answer

Gerhard Weiss picture Gerhard Weiss · Oct 12, 2010

Here is a solution for the single column search using PATINDEX.
It also displays the StartPosition, InvalidCharacter and ASCII code.

select line,
  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
  substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
  ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from  staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0