SQL Server searching for text in a column

David picture David · Aug 4, 2013 · Viewed 31.9k times · Source

I'm confused over what to use?

Basically I need to have a search string that can search a single column for the occurrences of multiple phrases, each input phrase is separated by a space.

So input from user would be like:

"Phrase1 Phrase2 ... PhraseX"     (number of phrases can 0 to unknown!, but say < 6)

I need to search with the logic:

Where 'Phrase1%' **AND** 'Phrase2%' **AND** ... 'PhraseX%'

.. etc... so all phrases need to be found.

Always logical AND

SO speed, performance taken in to account, Do I use:

Lot's of

Like 'Phrase1%' and like 'Phrase2%' and like ... 'PhraseX%' ?

or use

patindex('Phrase1%' , column) > 0 AND  patindex('Phrase2%' , column) > 0 
AND ...     patindex('PhraseX%' , column) 

or use

add a Full Text search Index,

The use:

Where Contatins(Column, 'Phrase1*') AND Contatins(Column, 'Phrase2*') AND ... Contatins(Column, 'PhraseX*')

Or

????

Almost too many options, which is why I'm asking, what would be the most efficient way of doing this...

Your wisdom is appreciated...

Answer

Gordon Linoff picture Gordon Linoff · Aug 4, 2013

If you are searching for AND, then the correct wildcard search would be:

Like '%Phrase1%' and like '%Phrase2%' and like ... '%PhraseX%' 

There is no reason to use patindex() here, because like is sufficient and well optimized. Well optimized, but this case cannot be made efficient. It is going to require a full table scan. And, if the text field is really, really big (I mean at least thousands or tens of thousands of characters), performance will not be good.

The solution is full text search. You would phrase this as:

where CONTAINS(column, 'Phrase1 AND phrase2 AND . . . ');

The only issue here would be when the "phrases" (which seem to be words) you are looking for are stop words.

In conclusion, if you have more than a few thousand rows or the text field you are searching has more than a few thousand characters, then use the full text option. This is just for guidance. If you are searching through a reference table with 100 rows and looking in the description field that has up to 100 characters, then the like method should be fine.