SQL function very slow compared to query without function wrapper

Mmeyer picture Mmeyer · Jan 30, 2015 · Viewed 7.5k times · Source

I have this PostgreSQL 9.4 query that runs very fast (~12ms):

SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email, 
  customers.name,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = 2
ORDER BY
  auth_web_events.id DESC;

But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views.

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
    id int,
    time_stamp timestamp with time zone,
    description text,
    origin text,
    userlogin text,
    customer text,
    client_ip inet
     ) AS
$func$
SELECT 
  auth_web_events.id, 
  auth_web_events.time_stamp, 
  auth_web_events.description, 
  auth_web_events.origin,  
  auth_user.email AS user, 
  customers.name AS customer,
  auth_web_events.client_ip
FROM 
  public.auth_web_events, 
  public.auth_user, 
  public.customers
WHERE 
  auth_web_events.user_id_fk = auth_user.id AND
  auth_user.customer_id_fk = customers.id AND
  auth_web_events.user_id_fk = $1
ORDER BY
  auth_web_events.id DESC;
  $func$ LANGUAGE SQL;

The query plan is:

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
"  Sort Key: auth_web_events.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events  (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"              ->  Index Scan using auth_user_pkey on auth_user  (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
"                    Index Cond: (id = 2)"
"        ->  Index Scan using customers_id_idx on customers  (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
"              Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"

I'm calling the funcion on this way:

SELECT * from get_web_events_by_userid(2)  

The query plan for the function:

"Function Scan on get_web_events_by_userid  (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"

EDIT: I just change the parameters, and the issue persist.
EDIT2: Query plan for the Erwin answer:

"Sort  (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
"  Sort Key: w.id"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
"        ->  Nested Loop  (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
"              ->  Index Scan using auth_user_pkey on auth_user u  (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
"                    Index Cond: (id = 2)"
"              ->  Index Scan using auth_web_events_fk1 on auth_web_events w  (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
"                    Index Cond: (user_id_fk = 2)"
"        ->  Index Scan using customers_id_idx on customers c  (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
"              Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"

Answer

Erwin Brandstetter picture Erwin Brandstetter · Jan 30, 2015

user

While rewriting your function I realized that you added column aliases here:

SELECT 
  ...
  auth_user.email AS user, 
  customers.name AS customer,

.. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment.

But it also makes your query invalid, because user is a completely reserved word and cannot be used as column alias unless double-quoted.

Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects.

Your function rewritten (otherwise equivalent):

CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
  RETURNS TABLE(
     id int
   , time_stamp timestamptz
   , description text
   , origin text
   , userlogin text
   , customer text
   , client_ip inet
  ) AS
$func$
SELECT w.id
     , w.time_stamp
     , w.description 
     , w.origin  
     , u.email     -- AS user   -- make this a comment!
     , c.name      -- AS customer
     , w.client_ip
FROM   public.auth_user       u
JOIN   public.auth_web_events w ON w.user_id_fk = u.id
JOIN   public.customers       c ON c.id = u.customer_id_fk 
WHERE  u.id = $1   -- reverted the logic here
ORDER  BY w.id DESC
$func$ LANGUAGE sql STABLE;

Obviously, the STABLE keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:

You have a very odd data distribution:

auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record

Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added ROWS 2. Might change the query plan for the VOLATILE variant as well (even if STABLE is the right choice anyway here).