Performance of SQL comparison using substring vs like with wildcard

Swoop  picture Swoop · Sep 15, 2011 · Viewed 21.3k times · Source

I am working on a join condition between 2 tables where one of the columns to match on is a concatentation of values. I need to join columnA from tableA to the first 2 characters of columnB from tableB.

I have developed 2 different statements to handle this and I have tried to analyze the performance of each method.

Method 1:

ON tB.columnB   like  tA.columnA || '%'

Method 2:

ON substr(tB.columnB,1,2) = tA.columnA

The query execution plan has a lot less steps using Method 1 compared to Method 2, however, it looks like Method 2 executes much faster. Also, the execution plan shows a recommended index for Method 2 that could improve its performance.

I am running this on an IBM iSeries, though would be interested in answers in a general sense to learn more about sql query optimization.

Does it make sense that Method 2 would execute faster?

This SO question is similar, but it looks like no one provided any concrete answers to the performance difference of these approaches: T-SQL speed comparison between LEFT() vs. LIKE operator.

PS: The table design that requires this type of join is not something that I can get changed at this time. I realize having the fields separated which hold different types of data would be preferrable.

Answer

earthiverse picture earthiverse · May 6, 2014

I ran the following in the SQL Advisor in IBM Data Studio on one of the tables in my DB2 LUW 10.1 database:

SELECT *
FROM PDM.DB30
WHERE DB30_SYSTEM_ID = 'XXX'
    AND DB30_VERSION_ID = 'YYY'
    AND SUBSTR(DB30_REL_TABLE_NM, 1, 4) = 'ZZZZ'

and

SELECT * 
FROM PDM.DB30 
WHERE DB30_SYSTEM_ID = 'XXX' 
    AND DB30_VERSION_ID = 'YYY' 
    AND DB30_REL_TABLE_NM LIKE 'ZZZZ%' 

They both had the exact same access path utilizing the same index, the same estimated IO cost and the same estimated cardinality, the only difference being the estimated total CPU cost for the LIKE was 178,343.75 while the SUBSTR was 197,518.48 (~10% difference).

The cumulative total cost for both were the same though, so this difference is negligible as per the advisor.