I have this PostgreSQL 9.4 query that runs very fast (~12ms):
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $1
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
The query plan is:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
I'm calling the funcion on this way:
SELECT * from get_web_events_by_userid(2)
The query plan for the function:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
EDIT: I just change the parameters, and the issue persist.
EDIT2: Query plan for the Erwin answer:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
user
While rewriting your function I realized that you added column aliases here:
SELECT
...
auth_user.email AS user,
customers.name AS customer,
.. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment.
But it also makes your query invalid, because user
is a completely reserved word and cannot be used as column alias unless double-quoted.
Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects.
Your function rewritten (otherwise equivalent):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE(
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
) AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $1 -- reverted the logic here
ORDER BY w.id DESC
$func$ LANGUAGE sql STABLE;
Obviously, the STABLE
keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN
does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:
You have a very odd data distribution:
auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record
Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added ROWS 2
. Might change the query plan for the VOLATILE
variant as well (even if STABLE
is the right choice anyway here).