Why is a UDF so much slower than a subquery?

devzero picture devzero · Feb 4, 2009 · Viewed 10.7k times · Source

I have a case where I need to translate (lookup) several values from the same table. The first way I wrote it, was using subqueries:

SELECT
    (SELECT id FROM user WHERE user_pk = created_by) AS creator,
    (SELECT id FROM user WHERE user_pk = updated_by) AS updater,
    (SELECT id FROM user WHERE user_pk = owned_by) AS owner,
    [name]
FROM asset

As I'm using this subquery a lot (that is, I have about 50 tables with these fields), and I might need to add some more code to the subquery (for example, "AND active = 1" ) I thought I'd put these into a user-defined function UDF and use that. But the performance using that UDF was abysmal.

CREATE FUNCTION dbo.get_user ( @user_pk INT )
RETURNS INT
AS BEGIN 
    RETURN ( SELECT id
             FROM   ice.dbo.[user]
             WHERE  user_pk = @user_pk )
END

SELECT dbo.get_user(created_by) as creator, [name]
FROM asset

The performance of #1 is less than 1 second. Performance of #2 is about 30 seconds...

Why, or more importantly, is there any way I can code in SQL server 2008, so that I don't have to use so many subqueries?

Edit:

Just a litte more explanation of when this is useful. This simple query (that is, get userid) gets a lot more complex when I want to have a text for a user, since I have to join with profile to get the language, with a company to see if the language should be fetch'ed from there instead, and with the translation table to get the translated text. And for most of these queries, performance is a secondary issue to readability and maintainability.

Answer

gbn picture gbn · Feb 4, 2009

The UDF is a black box to the query optimiser so it's executed for every row. You are doing a row-by-row cursor. For each row in an asset, look up an id three times in another table. This happens when you use scalar or multi-statement UDFs (In-line UDFs are simply macros that expand into the outer query)

One of many articles on the problem is "Scalar functions, inlining, and performance: An entertaining title for a boring post".

The sub-queries can be optimised to correlate and avoid the row-by-row operations.

What you really want is this:

SELECT
   uc.id AS creator,
   uu.id AS updater,
   uo.id AS owner,
   a.[name]
FROM
    asset a
    JOIN
    user uc ON uc.user_pk = a.created_by
    JOIN
    user uu ON uu.user_pk = a.updated_by
    JOIN
    user uo ON uo.user_pk = a.owned_by

Update Feb 2019

SQL Server 2019 starts to fix this problem.