Is count(*) really expensive?

Anil Namde picture Anil Namde · Apr 27, 2010 · Viewed 7.2k times · Source

I have a page where I have 4 tabs displaying 4 different reports based off different tables.

I obtain the row count of each table using a select count(*) from <table> query and display number of rows available in each table on the tabs. As a result, each page postback causes 5 count(*) queries to be executed (4 to get counts and 1 for pagination) and 1 query for getting the report content.

Now my question is: are count(*) queries really expensive -- should I keep the row counts (at least those that are displayed on the tab) in the view state of page instead of querying multiple times?

How expensive are COUNT(*) queries ?

Answer

Quassnoi picture Quassnoi · Apr 27, 2010

In general, the cost of COUNT(*) cost is proportional to the number of records satisfying the query conditions plus the time required to prepare these records (which depends on the underlying query complexity).

In simple cases where you're dealing with a single table, there are often specific optimisations in place to make such an operation cheap. For example, doing COUNT(*) without WHERE conditions from a single MyISAM table in MySQL - this is instantaneous as it is stored in metadata.

For example, Let's consider two queries:

SELECT  COUNT(*)
FROM    largeTableA a

Since every record satisfies the query, the COUNT(*) cost is proportional to the number of records in the table (i.e., proportional to what it returns) (Assuming it needs to visit the rows and there isnt a specific optimisation in place to handle it)

SELECT  COUNT(*)
FROM    largeTableA a
JOIN    largeTableB b
ON      a.id = b.id

In this case, the engine will most probably use HASH JOIN and the execution plan will be something like this:

  1. Build a hash table on the smaller of the tables
  2. Scan the larger table, looking up each records in a hash table
  3. Count the matches as they go.

In this case, the COUNT(*) overhead (step 3) will be negligible and the query time will be completely defined by steps 1 and 2, that is building the hash table and looking it up. For such a query, the time will be O(a + b): it does not really depend on the number of matches.

However, if there are indexes on both a.id and b.id, the MERGE JOIN may be chosen and the COUNT(*) time will be proportional to the number of matches again, since an index seek will be performed after each match.