MySQL Memory Engine vs InnoDB on RAMdisk

Kavi Siegel picture Kavi Siegel · Jul 25, 2013 · Viewed 10.1k times · Source

I'm writing a bit of software that needs to flatten data from a hierarchical type of format into tabular format. Instead of doing it all in a programming language every time and serving it up, I want to cache the results for a few seconds, and use SQL to sort and filter. When in use, we're talking 400,000 writes and 1 or 2 reads over the course of those few seconds.

Each table will contain 3 to 15 columns. Each row will contain from 100 bytes to 2,000 bytes of data, although it's possible that in some cases, some rows may get up to 15,000 bytes. I can clip data if necessary to keep things sane.

The main options I'm considering are:

MySQL's Memory engine

A good option, almost specifically written for my use case! But.. "MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. MEMORY tables cannot contain BLOB or TEXT columns." - Unfortunately, I do have text fields with a length up to maybe 10,000 characters - and even that is a number that is not specifically limited. I could adjust the varchar length based on the max length of text columns as I loop through doing my flattening, but that's not totally elegant. Also, for my occasional 15,000 character row, does that mean I need to allocate 15,000 characters for every row in the database? If there was 100,000 rows, that's 1.3 gb not including overhead!

InnoDB on RAMDisk

This is meant to run on the cloud, and I could easily spin up a server with 16gb of ram, configure MySQL to write to tmpfs and use full featured MySQL. My concern for this is space. While I'm sure engineers have written the memory engine to prevent consuming all temp storage and crashing the server, I doubt this solution would know when to stop. How much actual space will my 2,000 bytes of data consume when in database format? How can I monitor it?

Bonus Questions

Indexes I will in fact know in advance which columns need to be filtered and sorted by. I could set up an index before I do inserts, but what kind of performance gain could I honestly expect on top of a ram disk? How much extra overhead to indexes add?

Inserts I'm assuming inserting multiple rows with one query is faster. But the one query, or series of large queries are stored in memory, and we're writing to memory, so if I did that I'd momentarily need double the memory. So then we talk about doing one or two or a hundred at a time, and having to wait for that to complete before processing more.. InnoDB doesn't lock the table but I worry about sending two queries too close to each other and confusing MySQL. Is this a valid concern? With the MEMORY engine I'd have to definitely wait for completion, due to table locks.

Temporary Are there any benefits to temporary tables other than the fact that they're deleted when the db connection closes?

Answer

O. Jones picture O. Jones · Jul 25, 2013

I suggest you use MyISAM. Create your table with appropriate indexes for your query. Then disable keys, load the table, and enable keys.

I suggest you develop a discipline like this for your system. I've used a similar discipline very effectively.

Keep two copies of the table. Call one table_active and the second one table_loading.

When it's time to load a new copy of your data, use commands like this.

  ALTER TABLE table_loading DISABLE KEYS;
  /* do your insertions here, to table_loading */
  /* consider using LOAD DATA INFILE if it makes sense.  */
  ALTER TABLE table_loading ENABLE KEYS;  /* this will take a while */
  /* at this point, suspend your software that's reading table_active */
  RENAME TABLE table_active TO table_old;
  RENAME TABLE table_loading TO table_active;
  /* now you can resume running your software */
  TRUNCATE TABLE table_old;
  RENAME TABLE table_old TO table_loading;

Alternatively, you can DROP TABLE table_old; and create a new table for table_loading instead of the last rename.

This two-table (double-buffered) strategy should work pretty well. It will create some latency because your software that's reading the table will work on an old copy. But you'll avoid reading from an incompletely loaded table.

I suggest MyISAM because you won't run out of RAM and blow up and you won't have the fixed-row-length overhead or the transaction overhead. But you might also consider MariaDB and the Aria storage engine, which does a good job of exploiting RAM buffers.

If you do use the MEMORY storage engine, be sure to tweak your max_heap_table_size system variable. If your read queries will use index range scans (sequential index access) be sure to specify BTREE style indexes. See here: http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html