OK so I have read a whole bunch of articles suggesting table-value functions and cross apply give better performance than a scalar udf. I wanted to write my function in both ways and then test to see which one is better - but I cannot figure out what I'm supposed to use/look for to understand which is the better option.
I'm using SQL Server 2005. I've tried running the estimated execution plan, the actual execution plan and analyze query in database engine tuning advisor and I don't know what it is trying to tell me.
Using showplan_all on /off it looks like the table based function will use more cpu 1.157e-06 vs 8.3e-05 but the table function has a total subtree cost of 0.000830157 vs 0.01983356.
The query cost of the table valued function also seems to have a higher cost than the scalar one. Even though I thought it was supposed to be the better option.
So whilst I'd like to prove it myself which one gives the better performance - I'm just not sure what to look for in these tools - so any suggestions would be appreciated!
I need to get an academic year value (based on a date range set in the database) based on a calendar date so the function contents are below - so its just whether I go scalar or table based. This year feeds into other queries..
CREATE FUNCTION fn_AcademicYear
(
-- Add the parameters for the function here
@StartDate DateTime
)
RETURNS
@AcademicYear TABLE
(
AcademicYear int
)
AS
BEGIN
DECLARE @YearOffset int, @AcademicStartDate DateTime
-- Lookup Academic Year Starting Date
SELECT @AcademicStartDate = CONVERT(DateTime,[Value])
FROM dbo.SystemSetting
WHERE [Key] = 'AcademicYear.StartDate'
SET @YearOffset = DATEPART(YYYY,@StartDate) - DATEPART(YYYY,@AcademicStartDate);
-- try setting academic looking start date to year of the date passed in
SET @AcademicStartDate = DATEADD(YYYY, @YearOffset, @AcademicStartDate);
IF @StartDate < @AcademicStartDate
BEGIN
SET @AcademicStartDate = DATEADD(YYYY, @YearOffset-1, @AcademicStartDate);
END
INSERT @AcademicYear
SELECT YEAR(@AcademicStartDate)
RETURN
Thanks!!
You may not be seeing the performance gains you expect because your table-valued function is multifunction, not in-line. Multi-function TVFs have to be executed in the same way as scalar UDFs - once per row - so there's very little gain.
Following the example in this article by Itzik Ben-Gan (which discusses the benefits of in-line TVFs), set up the following test:
Create a numbers table with 1 million rows:
SET NOCOUNT ON;
IF OBJECT_ID('dbo.T1') IS NOT NULL DROP TABLE T1;
GO
WITH
L0 AS (SELECT 0 AS c UNION ALL SELECT 0),
L1 AS (SELECT 0 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS (SELECT 0 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS (SELECT 0 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS (SELECT 0 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS (SELECT 0 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n FROM L5)
SELECT n INTO dbo.T1 FROM Nums WHERE n <= 1000000;
Run a million executions of your TVF using the following code:
set statistics time on
SELECT n,DATEADD(HOUR,n,'1900-01-01'),AY.AcademicYear
FROM T1
CROSS APPLY dbo.fn_AcademicYear(DATEADD(HOUR,n,'1900-01-01')) AS AY
set statistics time off
On my system, this showed an average of 83 seconds elapsed time for three executions, running DBCC dropcleanbuffers
between each execution.
If you perform a similar test for your scalar valued function, you should have a clearer idea of comparative performance.
The test also revealed what appears to be a bug in your function. If the AcademicYear.StartDate
is set to '2010-09-01', the Academic Year returned for an input of '1900-01-01' is 1789, where it seems like 1899 would be expected.
To get the best performance, you'd need to convert the TVF to be in-line - I came up with the following, which I believe corrects the bug:
CREATE FUNCTION fn_AcademicYear2
(
@StartDate DATETIME
)
RETURNS TABLE
AS
RETURN
(
-- Lookup Academic Year Starting Date
WITH dtCTE
AS
(
SELECT CONVERT(DATETIME,[Value]) AS dt
FROM dbo.SystemSetting
WHERE [KEY] = 'AcademicYear.StartDate'
)
SELECT CASE WHEN @StartDate >= DATEADD(YEAR,DATEDIFF(YEAR,dt,@StartDate),dt)
THEN YEAR(@StartDate)
ELSE YEAR(DATEADD(YEAR,DATEDIFF(YEAR,dt,@StartDate) - 1,dt))
END AS AcademicYear
FROM dtCTE
)
GO
This had an average elapsed time of 8.9 seconds over three runs - almost ten times faster.
The other thing to consider is that the performance benefit from using TVF will be negligible unless you are applying it to multiple rows, as in this test. If you're using it on one value at a time, you won't see a lot of benfit unless you have thousands of instances of the function executing in parallel.