Why do Scalar-valued functions seem to cause queries to run cumulatively slower the more times in succession that they are used?
I have this table that was built with data purchased from a 3rd party.
I've trimmed out some stuff to make this post shorter... but just so you get the idea of how things are setup.
CREATE TABLE [dbo].[GIS_Location](
[ID] [int] IDENTITY(1,1) NOT NULL, --PK
[Lat] [int] NOT NULL,
[Lon] [int] NOT NULL,
[Postal_Code] [varchar](7) NOT NULL,
[State] [char](2) NOT NULL,
[City] [varchar](30) NOT NULL,
[Country] [char](3) NOT NULL,
CREATE TABLE [dbo].[Address_Location](
[ID] [int] IDENTITY(1,1) NOT NULL, --PK
[Address_Type_ID] [int] NULL,
[Location] [varchar](100) NOT NULL,
[State] [char](2) NOT NULL,
[City] [varchar](30) NOT NULL,
[Postal_Code] [varchar](10) NOT NULL,
[Postal_Extension] [varchar](10) NULL,
[Country_Code] [varchar](10) NULL,
Then I have two functions that look up LAT and LON.
CREATE FUNCTION [dbo].[usf_GIS_GET_LAT]
(
@City VARCHAR(30),
@State CHAR(2)
)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @LAT INT
SET @LAT = (SELECT TOP 1 LAT FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)
RETURN @LAT
END
CREATE FUNCTION [dbo].[usf_GIS_GET_LON]
(
@City VARCHAR(30),
@State CHAR(2)
)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @LON INT
SET @LON = (SELECT TOP 1 LON FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)
RETURN @LON
END
When I run the following...
SET STATISTICS TIME ON
SELECT
dbo.usf_GIS_GET_LAT(City,[State]) AS Lat,
dbo.usf_GIS_GET_LON(City,[State]) AS Lon
FROM
Address_Location WITH(NOLOCK)
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)
SET STATISTICS TIME OFF
100 ~= 8 ms, 200 ~= 32 ms, 400 ~= 876 ms
--Edit Sorry I should have been more clear. I'm not looking to tune the query listed above. This is just a sample to show the execution time getting slower the more records it crunches through. In the real world application the functions are used as part of a where clause to build a radius around a city and state to include all records with in that region.
In most cases, it's best to avoid scalar valued functions that reference tables because (as others said) they are basically black boxes that need to be ran once for every row, and cannot be optimized by the query plan engine. Therefore, they tend to scale linearly even if the associated tables have indexes.
You may want to consider using an inline-table-valued function, since they are evaluated inline with the query, and can be optimized. You get the encapsulation you want, but the performance of pasting the expressions right in the select statement.
As a side effect of being inlined, they can't contain any procedural code (no declare @variable; set @variable = ..; return). However, they can return several rows and columns.
You could re-write your functions something like this:
create function usf_GIS_GET_LAT(
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 lat
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);
GO
create function usf_GIS_GET_LON (
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 LON
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);
The syntax to use them is also a little different:
select
Lat.Lat,
Lon.Lon
from
Address_Location with (nolock)
cross apply dbo.usf_GIS_GET_LAT(City,[State]) AS Lat
cross apply dbo.usf_GIS_GET_LON(City,[State]) AS Lon
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)