For my website I use the PHP API for Flickr ( http://www.flickr.com/services/api/ ). This API provides several useful methods to get photos around particular GPS positions.
The call to API methods looks like URL with specific parameters like latitude, longitude, API key, radius, sorting, etc. Say, it'll look like http://api.flickr.com/method?lat=0.0&lon=0.0&radius=10
My website makes over 200,000 calls to API to generate several pages with pictures from Flickr. It is a pritty hard push on API thus I created a results cache in mySQL database.
Simplified scheme of the InnoDB table with cache is:
char(32) request
datetime expires // 2-3 days
text response // serialized data from API response
where request
is a PRIMARY KEY and represents an MD5 hash of a request URI. Other fields are quite simple :)
The problem arises when the table becomes large enough, say over 100,000 rows. New INSERTs
take up to 2 seconds (and up to 6 (!) second with 1,000,000 rows).
As far as I understand the problem is with PRIMARY INDEX and engine being InnoDB. Every time a new request is being inserted, InnoDB engine rebuilds the tree index and moves data around, because MD5(request) is a really random value.
So... The question is whether there is a better way to cache such requests? Or maybe I should switch to MyISAM engine? Or may be I should try pseudo-partitioning and create several tables to solve the problem? Or may be just use not a BTREE but HASH index?
Any ideas are welcome!
Edit:
Ok, I tried to alter table as Furicane and Johan suggested, but still no luck - INSERTs takes up to 3 seconds. Currently request
field became a normal non-unique index and new id
column has been added as PRIMARY KEY with auto increment. Also I tried to add 4 partitions on this table with same results.
I think that index on request
field is still a bottleneck. The only way I currently see is to determine all possible parameters, add them as columns to a table and then create index on them.
Any other ideas? :)
Edit 2:
Salman A in comments below said that his similar table performs much better (~0.03 for insert). This the problem may be in IO load on system. Though I cannot any high load on it.
iostat
results:
avg-cpu: %user %nice %system %iowait %steal %idle
22.94 0.71 8.42 8.50 0.00 59.43
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 38.01 151.04 114.32 1383655437 1047309046
iotop
results:
Total DISK READ: 152.91 K/s | Total DISK WRITE: 197.67 K/s
With mySQL on top of the list both for writing and reading. Maybe my disks are almost dead? How can I check disks performance?
InnoDB does not support hash
keys, only Btree.
MyISAM is infamous for being unreliable.
I think your problem is that you use the MD5 value for the primary key.
The primary key is included in every secondary key. And the PK is forced to be a unique key.
Set an integer autoincrement primary key and set your MD5 value as a normal index.
It does not even need to be unique, because that's a huge part of what's slowing you down.
After this your inserts should run much faster.