What is Full Text Search vs LIKE

Nathan W picture Nathan W · Oct 22, 2008 · Viewed 70.9k times · Source

I just read a post mentioning "full text search" in SQL.

I was just wondering what the difference between FTS and LIKE are. I did read a couple of articles but couldn't find anything that explained it well.

Answer

erickson picture erickson · Oct 22, 2008

In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.

Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.

The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.

Other features typical of full-text search are

  • lexical analysis or tokenization—breaking a block of unstructured text into individual words, phrases, and special tokens
  • morphological analysis, or stemming—collapsing variations of a given word into one index term; for example, treating "mice" and "mouse", or "electrification" and "electric" as the same word
  • ranking—measuring the similarity of a matching record to the query string