What is the difference between Seq Scan and Bitmap heap scan in postgres?

Sunil picture Sunil · Jan 4, 2009 · Viewed 24.2k times · Source

In output of explain command I found two terms 'Seq Scan' and 'Bitmap heap Scan'. Can somebody tell me what is the difference between these two types of scan? (I am using PostgreSql)

Answer

derobert picture derobert · Jan 4, 2009

http://www.postgresql.org/docs/8.2/static/using-explain.html

Basically, a sequential scan is going to the actual rows, and start reading from row 1, and continue until the query is satisfied (this may not be the entire table, e.g., in the case of limit)

Bitmap heap scan means that PostgreSQL has found a small subset of rows to fetch (e.g., from an index), and is going to fetch only those rows. This will of course have a lot more seeking, so is faster only when it needs a small subset of the rows.

Take an example:

create table test (a int primary key, b int unique, c int);
insert into test values (1,1,1), (2,2,2), (3,3,3), (4,4,4), (5,5,5);

Now, we can easily get a seq scan:

explain select * from test where a != 4

                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test  (cost=0.00..34.25 rows=1930 width=12)
   Filter: (a <> 4)

It did a sequential scan because it estimates its going to grab the vast majority of the table; seeking to do that (instead of a big, seekless read) would be silly.

Now, we can use the index:

explain select * from test where a = 4 ;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..8.27 rows=1 width=4)
   Index Cond: (a = 4)

And finally, we can get some bitmap operations:

explain select * from test where a = 4 or a = 3;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.52..13.86 rows=2 width=12)
   Recheck Cond: ((a = 4) OR (a = 3))
   ->  BitmapOr  (cost=8.52..8.52 rows=2 width=0)
         ->  Bitmap Index Scan on test_pkey  (cost=0.00..4.26 rows=1 width=0)
               Index Cond: (a = 4)
         ->  Bitmap Index Scan on test_pkey  (cost=0.00..4.26 rows=1 width=0)
               Index Cond: (a = 3)

We can read this as:

  1. Build a bitmap of the rows we want for a=4. (Bitmap index scan)
  2. Build a bitmap of the rows we want for a=3. (Bitmap index scan)
  3. Or the two bitmaps together (BitmapOr)
  4. Look those rows up in the table (Bitmap Heap Scan) and check to make sure a=4 or a=3 (recheck cond)

[Yes, these query plans are stupid, but that's because we failed to analyze test Had we analyzed it, they'd all be sequential scans, since there are 5 tiny rows]