MySQL char & varchar character sets & storage sizes

pospi picture pospi · Apr 10, 2012 · Viewed 12.2k times · Source

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.

CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set

VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes

This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.

Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).

Answer

Celada picture Celada · Apr 10, 2012

CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is crippled for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.

Now, CHAR just goes ahead and reserves this amount of storage.

VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.

Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:

  • VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 (crippled) UTF-8 characters is 3 × 85 = 255.
  • VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 (crippled) UTF-8 characters is 3 × 86 = 258.