Get variance and standard deviation of two numbers in two different rows/columns with sqlite / PHP

Michael Meier picture Michael Meier · Jan 20, 2013 · Viewed 13.9k times · Source

I have a SQLite Database with the following structure:

rowid       ID                  startTimestamp   endTimestamp   subject
1           00:50:c2:63:10:1a   1000             1090           entrance
2           00:50:c2:63:10:1a   1100             1270           entrance
3           00:50:c2:63:10:1a   1300             1310           door1
4           00:50:c2:63:10:1a   1370             1400           entrance
.
.
.

I have prepared a sqlfiddle here: http://sqlfiddle.com/#!2/fe8c6/2

With this SQL-Query i can get the average differences between the endTime and the startTime between one row and the following row, sorted by subject and ID:

SELECT
    id,
    ( MAX(endtimestamp) - MIN(startTimestamp)
    - SUM(endtimestamp-startTimestamp)
    ) / (COUNT(*)-1) AS averageDifference
FROM
    table1
WHERE ID = '00:50:c2:63:10:1a'
AND subject = 'entrance'
GROUP BY id;

My problem: To calcute the average value is no problem, that does this query. But how can i get the standard deviation and the variance of this values?

Answer

CL. picture CL. · Jan 20, 2013

For formulas that are more complex than simple summation, you have to compute the actual difference values for each record by lookin up the corresponding next start times, like this:

SELECT (SELECT MIN(startTimestamp)
        FROM table1 AS next
        WHERE next.startTimestamp > table1.startTimestamp
          AND ID = '...'
       ) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
  AND ID = '...'

Then you can use all the difference values to do the calculations:

SELECT SUM(timeDifference) / COUNT(*) AS average,
       AVG(timeDifference)            AS moreEfficientAverage,
       SUM(timeDifference * timeDifference) / COUNT(*) -
       AVG(timeDifference) * AVG(timeDifference) AS variance
FROM (SELECT (SELECT MIN(startTimestamp)
              FROM table1 AS next
              WHERE next.startTimestamp > table1.startTimestamp
                AND next.ID = '...'
             ) - endTimestamp AS timeDifference
      FROM table1
      WHERE nextStartTimestamp IS NOT NULL
        AND ID = '...')