Questions about accent insensitivity in SQL Server (Latin1_General_CI_AS)

Brett Postin picture Brett Postin · Jan 25, 2013 · Viewed 8.8k times · Source

All our databases were installed using the default collation (Latin1_General_CI_AS).

We plan to change the collation to allow clients to search the database with accent insensitivity.

Questions:

  1. What are the negatives (if any) of having an accent insensitive database?

  2. Are there any performance overheads for an accent insensitive database?

  3. Why is the default for SQL Server collation accent sensitive; why would anyone want accent sensitive by default?

Answer

Ben picture Ben · Jan 25, 2013

Seriously, changing database collations is a royal pain. See this HOWTO from codeproject, and then think hard before you do it! This is the EASY way!

Firstly, you can permit searches of the database with accent insensitivity simply by specifying that as part of the search, you don't necessarily have to change the collation.

 select * from TableName
 where name collate Latin1_General_CI_AI like @parameter

Simple as. However, this will hurt the indexes.

An alternative is to supply a calculated field which you can index separately.

    create table TableName(
    ix int identity primary key,
    name nvarchar(20) collate latin1_general_ci_as
    )
    go
    alter table TableName
    add  name_AI as name collate latin1_general_CI_AI
    go
    create index IX_TableName_name_AI
    on dbo.TableName(name_AI)

The example above puts it in the table, but you could just as well create an indexed view.

    create view dbo.TableName_AI
    with schemabinding
    as 
    select ix,
    name collate Latin1_general_CI_AI as name
    from dbo.TableName
    go
    -- Need a unique clustered index first
    create unique clustered index IX_TableName_AI_Clustered on dbo.TableName_AI(ix)
    -- then the index for searching
    create index IX_TableName_AI_name on dbo.TableName_AI(name)

Then, for accent-insensitive searches, use the view TableName_AI.

To answer your specific questions:

  1. In an accent insensitive database, accent sensitive searches will be slower.

  2. Yes, but not so you would notice

  3. It just is. Something has to be the default: If you don't like it don't use the default!

    Think of it this way: "Hard" and "Herd" are not the same word. That one vowel difference is enough - even though they sound similar.

    An accent difference (a vs. á) is somewhere between a case difference (A vs. a), and a letter difference (a vs e). You have to draw the line somewhere.

    An accent affects the sound of the word and can make it have a different meaning, though I struggle to think of examples. I guess it makes more sense to someone who has words in their database in a language which makes use of accents.