How to insert a very large number of records into a MySql database as fast as possible

Hancy picture Hancy · Dec 12, 2011 · Viewed 19.4k times · Source

I have a database table like below:

create table temperature
(id int unsigned not null auto_increment primary key,
temperature double
);

And in my program I got about 20 million temperature to insert into the table. I worke in .Net environment, use Connector/Net connecting to MySql. The code was like below:

List<double> temps = new List<double>();
...
string connStr = "server=localhost;user=name;database=test;port=3306;password=*****;";
MySqlConnection conn = new MySqlConnection(connStr);
try
{
    conn.Open();

    //temps.Count is about 20 million
    for (int i = 0; i < temps.Count; i++)
    {
        string sql1 = "INSERT INTO temperature VALUES (null, "+temps[i]+")";
        MySqlCommand cmd1 = new MySqlCommand(sql1, conn);
        cmd1.ExecuteNonQuery();
    }

}
catch (Exception ex)
{
    Console.WriteLine(ex.ToString());
}
conn.Close();

How can i insert so many lines data as fast as possible? (It can only insert 2000 records every minute in my computer.)

Answer

Thomas Luzat picture Thomas Luzat · Feb 15, 2013

There are a number of ways to optimize bulk inserts. Some are:

  • LOAD DATA INFILE. There is a wrapper API for .NET. This is the fastest way, but has some limitations and semantic differences from simple inserts.

  • Multiple-row INSERT statements:

    INSERT INTO temperature (temperature) VALUES (1.0), (2.0), (3.0), ...

    You should not insert 20.000.000 rows at once, but may want to try 1.000-10.000 for a very large speed-up. This is a simple and very unproblematic way to increase speed. A factor of 10 and sometimes way more is often possible.

  • Locking the table (LOCK TABLES).

  • Disabling indexes temporarily.

  • MySQL options tuning.

  • INSERT DELAYED (most likely not that useful here).

The documentation does give you more elaborate detail on the options. Some options depend on the table type (InnoDB vs. MyISAM).

A general suggestion: Always specify the columns that you insert in front of VALUES. This makes for more maintainable code.