Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?

Legend picture Legend · Sep 12, 2010 · Viewed 51.6k times · Source

A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested that the current setup of my cluster is not suitable for my need. My table structure is:

CREATE TABLE `IPAddresses` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `ipaddress` bigint(20) unsigned default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM;

And I added the 71 million records and then did a:

ALTER TABLE IPAddresses ADD INDEX(ipaddress);

It's been 14 hours and the operation is still not completed. Upon Googling, I found that there is a well-known approach for solving this problem - Partitioning. I understand that I need to partition my table now based on the ipaddress but can I do this without recreating the entire table? I mean, through an ALTER statement? If yes, there was one requirement saying that the column to be partitioned on should be a primary key. I will be using the id of this ipaddress in constructing a different table so ipaddress is not my primary key. How do I partition my table given this scenario?

Answer

Legend picture Legend · Sep 16, 2010

Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

Problem: Your table has millions of entries and you need to add an index really fast

Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

Solution: Partition your table using MySQL's Partitioning strategy

Case #1: When the table you want is not yet created

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

Case #2: When the table you want is already created. There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

CREATE TABLE IPADDRESSES_TEMP(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id)
) ENGINE=MYISAM;

Insert your IP addresses into this table. And then create the actual table with partitions:

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

And then finally

INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
DROP TABLE IPADDRESSES_TEMP;
ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)

And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.