Firebird default character set

truthseeker picture truthseeker · Dec 10, 2012 · Viewed 11.6k times · Source

SQL select command

SELECT a.RDB$CHARACTER_SET_NAME FROM RDB$DATABASE a

returns NULL. What character set is used when not specified any when creating new database? Is there difference between various versions of Firebird (1.0, 2.0, 2.5.1 etc.)?

Answer

Mark Rotteveel picture Mark Rotteveel · Dec 10, 2012

The default character set for a database when no character set was specified during creation is character set NONE, see page 47 of the Interbase 6.0 Data Definition Guide (available in the documentation section of the firebird website). This has been the way since before Firebird (probably since the creation of Interbase) and still applies to the existing versions. However, under Firebird 2.5 when a database is created without a default character set, then RDB$CHARACTER_SET_NAME will have value NONE. I am not sure if this was different in earlier versions, my guess would still be it uses NONE as the default even if it reports NULL.

If you want to be sure, you can simply create a basic table with a CHAR or VARCHAR column without a character set specification, and then use the following query to determine the default:

SELECT a.RDB$FIELD_NAME, a.RDB$RELATION_NAME, 
       b.RDB$CHARACTER_SET_ID, c.RDB$CHARACTER_SET_NAME
FROM RDB$RELATION_FIELDS a
INNER JOIN RDB$FIELDS b 
   ON b.RDB$FIELD_NAME = a.RDB$FIELD_SOURCE
INNER JOIN RDB$CHARACTER_SETS c 
   ON c.RDB$CHARACTER_SET_ID = b.RDB$CHARACTER_SET_ID
WHERE RDB$RELATION_NAME = 'TABLE_NAME'

You can use this to find the character set of any ((VAR)CHAR) field BTW.

Character set NONE means that there is no character set assumptions, so you can store data in it in any characterset. However you cannot store or compare it to a column that has an explicit character set (except maybe character set OCTETS, not sure about that).

If you use NONE then you need to make sure you always use the same connection character set when connecting to the database or if you use character set NONE as a connection character set, that your application, driver, access component or programming language always uses the same encoding, otherwise you'll get transliteration problems (character encoding problems).

Using NONE as the connection character set has additional issues. For example the data of a column will always be sent as is, and stored as received except when a byte combination is not allowed in the character set of the column. Basically it means database must be used in the same language environment it was created on.

In general it is better to be explicit about the default character set, unless you know what you are doing.