Slow insert speed in Postgresql memory tablespace

Prashant picture Prashant · May 28, 2010 · Viewed 9.7k times · Source

I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.

I have also done the following tuning in the postgres config:

  • Indexes : no
  • fsync : false
  • logging : disabled

Other information:

  • Tablespace : RAM
  • Number of columns in one row : 25 (mostly integers)
  • CPU : 4 core, 2.5 GHz
  • RAM : 48 GB

I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?

Help appreciated.

Prashant

Answer

Dave Jarvis picture Dave Jarvis · May 28, 2010

Fast Data Loading

  1. Translate your data to CSV.
  2. Create a temporary table (as you noted, without indexes).
  3. Execute a COPY command: \COPY schema.temp_table FROM /tmp/data.csv WITH CSV
  4. Insert the data into the non-temporary table.
  5. Create indexes.
  6. Set appropriate statistics.

Further Recommendations

For large volumes of data:

  1. Split the data into child tables.
  2. Insert it in order of the column from which most of the SELECT statements will use. In other words, try to align the physical model with the logical model.
  3. Adjust your configuration settings.
  4. Create a CLUSTER index (most important column on the left). For example:
    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

Configuration Settings

On a machine with 4GB of RAM, I did the following...

Kernel Configuration

Tell the Kernel that it's okay for programs to use gobs of shared memory:

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL Configuration

  1. Edit /etc/postgresql/8.4/main/postgresql.conf and set:
    shared_buffers = 1GB
    temp_buffers = 32MB
    work_mem = 32MB
    maintenance_work_mem = 64MB
    seq_page_cost = 1.0
    random_page_cost = 2.0
    cpu_index_tuple_cost = 0.001
    effective_cache_size = 512MB
    checkpoint_segments = 10
  2. Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
  3. Restart PostgreSQL.

Child Tables

For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).

Master Table

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

Child Table

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

Table Statistics

Bump up the table stats for the important columns:

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

Don't forget to VACUUM and ANALYSE afterwards.