MySQL different treatment between VarChar(255) and VarChar(65536)

TheSquad picture TheSquad · May 5, 2011 · Viewed 27.5k times · Source

Does anyone know the difference between using VarChar(255) and VarChar(65536) ?

Here's what I know so far :

  • VarChar(255) will only use one byte for size
  • VarChar(65536) will use two bytes for size
  • VarChar(65536) exists only since MySQL 5.0.3
  • MySQL use a different treatment between 255 and 65536 (don't know the difference though)

What I'm not sure of if is it any performance difference between those varchars.

Let's say I would like to create a table with a multiple type of row. Using an enum with 'data_type1' and 'data_type2'

data_type1 will never have more than 255 characters in the varchar data_type2 will mostly have more than 255 characters in the varchar

Which solution table would be better ?

id (int) - autoincrement
type (enum : data_type1, data_type2)
msg (varchar(255))
data (TEXT)

Using only the 'data' column when type == data_type2 ?

or

id (int) - autoincrement
type (enum : data_type1, data_type2)
msg (varchar(65536))

Using 'msg' column whatever type is present ?

The type column is actually wanted on both solution for treatment purpose.

EDIT :

When type == data_type2, the data stored will actually never be over 10000 characters

EDIT 2 :

I'm not looking to search in columns msg and data

This is a simple question about storage performance, not indexes or search perf...

Answer

Michael J.V. picture Michael J.V. · May 5, 2011

You mentioned some things that are true, I'll explain how VARCHAR works.

If you specify VARCHAR(60) that means it can hold up to 60 characters. If it contains less characters, let's say 50 - then MySQL uses 50 bytes to store the data instead of 60.

It's quite the opposite with CHAR(60) - it reserves 60 bytes regardless of the length of the string you want to save.

Now, how does VARCHAR work? If you specify it as VARCHAR(255), that means that the column will reserve 1 byte + length of the string bytes. That 1 byte indicates how long the string is. 1 byte = you can have from 0 - 255 values stored (2 to the power of 8 = 256).

As for VARCHAR that is above 255, you need to somehow store how many bytes are being used. Since 1 byte can store only up to 256 distinct values (0 - 255), you need to use two bytes. Two to the power of 16 = 65536, which means you can store any string up to that size and then it adds up 2 bytes to indicate how long the string is.

So, to cut it short - performance difference is that if you have VARCHAR(65536) and you use 200 bytes to store the text - you are wasting that 1 extra byte that VARCHAR(65536) will use. One might think "oh, but that's just 1 byte, who cares about that". A lot of people do actually - imagine having several VARCHAR columns on a table that's got 50 million records. Say you got 3 varchar columns, each wastes additional byte - that's 3 bytes * 50 million ~ 144 megabytes of wasted space. The funny thing is, it's not just wasted space. It also contributes to processing overhead and using extra RAM when you want to read something. And who says there's going to be only 1 table in your DB that's going to get large?

Knowing this information might help you deciding yourself what's the best to use.

http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html if you didn't check it before, it explains how every data type is stored and how much space it requires.