My Postgres
query calculates statistical aggregate
from a bunch of sensor readings:
SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name,
to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name,
varid As category,
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(STDDEV(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As StatisticsValue
FROM loggingdb_ips_integer As log
JOIN ipsobjects_with_parent ips ON log.varid = ips.objectid
AND (ipstimestamp > (now()- '2 days'::interval))
GROUP BY row_name, hour_row_name, category;
This works fine as long as I have >1 ipsvalue/hour
. If the hourly COUNT(ipsvalue)<2
, however, StatisticsValue
returns NULL
without any Postgres errors.
If I comment out STTDEV
, as in the following:
(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' ||
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' ||
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT ||', ' As value
then all three stats are calculated correctly. I therefore conclude that an illegittimate STDDEV
brings down the whole query. I would rather have illegittimate STDDEVs returning 0. I tried to COALESCE
the STDDEV
line, to no avail. What can be done???
COALESCE
should work.
You could also use (it that fits you) the "population standard deviation" stddev_pop
, instead of the "sample standard deviation" stddev_samp
; the later is divides by n-1
and is aliased to STDDEV
. stddev_pop
, instead , divides by n
, and it returns zero (instead of NULL
) when given one sample.
If you don't know the difference between these estimators, it's explained in every statistic textbook, eg http://en.wikipedia.org/wiki/Standard_deviation#Estimation