Database index on a column with duplicate values

Gaurav picture Gaurav · Jan 31, 2015 · Viewed 14.7k times · Source

If there is a table containing details of employees including a column Gender whose value can be either M/F. Now would it make sense to create an index on this column, would it make the search faster? Logically if we fire a select statement with where clause containing Gender as the column, it should cut down the search time by half. But I have heard that this kind of index will not help and would be actually ignored by the Database Optimizer while executing the query. But I am not getting why? Can somebody please explain?

Answer

Barmar picture Barmar · Jan 31, 2015

In most cases, only one index can be used to optimize a database query. If a query needs to match several indexed columns, the query planner will have to decide which of these indexes to use. Each index has a cardinality, which is roughly the number of different values across the table. An index with higher cardinality will be more effective, because selecting rows that match the index will result in very few rows to scan to match the other conditions.

An index on a gender column will only cut the table in half. Any other index will be more effective.

As an analogy, think of phone books. If you had a single phone book for an entire country, it would be huge and hard to search for the specific person you want. So phone books are usually made for just a city, or a few cities in an area, to make them reasonable sizes. But if you instead had a "Male phone book" instead of regional phone books, it would be nearly as unusable as a phone book for the entire country. The criteria for creating new phone books is that they should be much smaller than a book for the entire country. A factor of 2 reduction isn't very useful when you're starting with an enormous size.