postgres STDDEV aggregate behavior when n<2

aag picture aag · Jun 26, 2012 · Viewed 9.1k times · Source

My Postgres query calculates statistical aggregate from a bunch of sensor readings:

SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name, 
to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name, 
varid As category, 

(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' || 
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT  ||', ' || 
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT  ||', ' || 
(STDDEV(ipsvalue)::NUMERIC(5,2))::TEXT  ||', ' As StatisticsValue 

FROM loggingdb_ips_integer As log 
JOIN ipsobjects_with_parent ips ON log.varid = ips.objectid 
AND (ipstimestamp > (now()- '2 days'::interval)) 
GROUP BY row_name, hour_row_name, category;

This works fine as long as I have >1 ipsvalue/hour. If the hourly COUNT(ipsvalue)<2, however, StatisticsValue returns NULL without any Postgres errors.

If I comment out STTDEV, as in the following:

(AVG(ipsvalue)::NUMERIC(5,2)) ||', ' || 
(MAX(ipsvalue)::NUMERIC(5,2))::TEXT  ||', ' || 
(MIN(ipsvalue)::NUMERIC(5,2))::TEXT  ||', ' As value 

then all three stats are calculated correctly. I therefore conclude that an illegittimate STDDEV brings down the whole query. I would rather have illegittimate STDDEVs returning 0. I tried to COALESCE the STDDEV line, to no avail. What can be done???

Answer

leonbloy picture leonbloy · Jun 26, 2012

COALESCE should work.

You could also use (it that fits you) the "population standard deviation" stddev_pop, instead of the "sample standard deviation" stddev_samp; the later is divides by n-1 and is aliased to STDDEV. stddev_pop, instead , divides by n , and it returns zero (instead of NULL) when given one sample.

If you don't know the difference between these estimators, it's explained in every statistic textbook, eg http://en.wikipedia.org/wiki/Standard_deviation#Estimation