Oracle 11G - Performance effect of indexing at insert

mon picture mon · Jul 2, 2015 · Viewed 8.4k times · Source

Objective

Verify if it is true that insert records without PK/index plus create thme later is faster than insert with PK/Index.

Note
The point here is not about indexing takes more time (it is obvious), but the total cost (Insert without index + create index) is higher than (Insert with index). Because I was taught to insert without index and create index later as it should be faster.

Environment

Windows 7 64 bit on DELL Latitude core i7 2.8GHz 8G memory & SSD HDD
Oracle 11G R2 64 bit

Background

I was taught that insert records without PK/Index and create them after insert would be faster than insert with PK/Index.

However 1 million record inserts with PK/Index was actually faster than creating PK/Index later, approx 4.5 seconds vs 6 seconds, with the experiments below. By increasing the records to 3 million (999000 -> 2999000), the result was the same.

Conditions

  • The table DDL is below. One bigfile table space for both data and index.
    (Tested a separate index tablespace with the same result & inferior overall perforemace)
  • Flush the buffer/spool before each run.
  • Run the experiment 3 times each and made sure the results were similar.

SQL to flush:

ALTER SYSTEM CHECKPOINT;
ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

Question

Would it be actually true that "insert witout PK/Index + PK/Index creation later" is faster than "insert with PK/Index"?

Did I make mistakes or missed some conditions in the experiment?

Insert records with PK/Index

TRUNCATE TABLE TBL2;
ALTER TABLE TBL2 DROP CONSTRAINT PK_TBL2_COL1 CASCADE;
ALTER TABLE TBL2 ADD  CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;

SET timing ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
  WITH DATA2(j) AS (
      SELECT 0 j FROM DUAL
      UNION ALL
      SELECT j+1000 FROM DATA2 WHERE j < 999000
  )
  SELECT j FROM DATA2
),
(
  WITH DATA1(i) AS (
      SELECT 1 i FROM DUAL
      UNION ALL
      SELECT i+1 FROM DATA1 WHERE i < 1000
  )
  SELECT i FROM DATA1
);
commit;

1,000,000 rows inserted.
Elapsed: 00:00:04.328 <----- Insert records with PK/Index

Insert records without PK/Index and create them after

TRUNCATE TABLE TBL2;
ALTER TABLE &TBL_NAME DROP CONSTRAINT PK_TBL2_COL1 CASCADE;

SET TIMING ON
INSERT INTO TBL2
SELECT i+j, rpad(TO_CHAR(i+j),100,'A')
FROM (
  WITH DATA2(j) AS (
      SELECT 0 j FROM DUAL
      UNION ALL
      SELECT j+1000 FROM DATA2 WHERE j < 999000
  )
  SELECT j FROM DATA2
),
(
  WITH DATA1(i) AS (
      SELECT 1 i FROM DUAL
      UNION ALL
      SELECT i+1 FROM DATA1 WHERE i < 1000
  )
  SELECT i FROM DATA1
);
commit;
ALTER TABLE TBL2 ADD CONSTRAINT PK_TBL2_COL1 PRIMARY KEY(COL1) ;

1,000,000 rows inserted.
Elapsed: 00:00:03.454 <---- Insert without PK/Index

table TBL2 altered.
Elapsed: 00:00:02.544 <---- Create PK/Index

Table DDL

CREATE TABLE TBL2 (
    "COL1" NUMBER,
    "COL2" VARCHAR2(100 BYTE),
    CONSTRAINT "PK_TBL2_COL1" PRIMARY KEY ("COL1")
) TABLESPACE "TBS_BIG" ;

Answer

Jon Heller picture Jon Heller · Jul 3, 2015

The current test case is probably good enough for you to overrule the "best practices". There are too many variables involved to make a blanket statement that "it's always best to leave the indexes enabled". But you're probably close enough to say it's true for your environment.

Below are some considerations for the test case. I've made this a community wiki in the hopes that others will add to the list.

  1. Direct-path inserts. Direct-path writes use different mechanisms and may work completely differently. Direct-path inserts can often be significantly faster than regular inserts, although they have some complicated restrictions (for example, triggers must be disabled) and disadvantages (the data is not immediately backed-up). One particular way it affects this scenario is that NOLOGGING for indexes only applies during index creation. So even if a direct-path insert is used, an enabled index will always generate REDO and UNDO.
  2. Parallelism. Large insert statements often benefit from parallel DML. Usually it's not worth worrying about the performance of bulk loads until it takes more than several seconds, which is when parallelism starts to be useful.
  3. Bitmap indexes are not meant for large DML. Inserts or updates to a table with a bitmap index can lock the whole table and lead to disastrous performance. It might be helpful to limit the test case to b-tree indexes.
  4. Add alter system switch logfile;? Log file switches can sometimes cause performance issues. The tests would be somewhat more consistent if they all started with empty logfiles.
  5. Move data generation logic into a separate step. Hierarchical queries are useful for generating data but they can have their own performance issues. It might be better to create in intermediate table to hold the results, and then only test inserting the intermediate table into the final table.