SQL to determine minimum sequential days of access?

Jeff Atwood picture Jeff Atwood · Jul 24, 2009 · Viewed 11.8k times · Source

The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of records, but only one record per day per user. If the user has not accessed the website for that day, no record will be generated.

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

What I'm looking for is a SQL query on this table with good performance, that tells me which userids have accessed the website for (n) continuous days without missing a day.

In other words, how many users have (n) records in this table with sequential (day-before, or day-after) dates? If any day is missing from the sequence, the sequence is broken and should restart again at 1; we're looking for users who have achieved a continuous number of days here with no gaps.

Any resemblance between this query and a particular Stack Overflow badge is purely coincidental, of course.. :)

Answer

Rob Farley picture Rob Farley · Jul 24, 2009

How about (and please make sure the previous statement ended with a semi-colon):

WITH numberedrows
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 
                                       ORDER BY CreationDate)
                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
                CreationDate,
                UserID
         FROM   tablename)
SELECT MIN(CreationDate),
       MAX(CreationDate),
       COUNT(*) AS NumConsecutiveDays,
       UserID
FROM   numberedrows
GROUP  BY UserID,
          TheOffset  

The idea being that if we have list of the days (as a number), and a row_number, then missed days make the offset between these two lists slightly bigger. So we're looking for a range that has a consistent offset.

You could use "ORDER BY NumConsecutiveDays DESC" at the end of this, or say "HAVING count(*) > 14" for a threshold...

I haven't tested this though - just writing it off the top of my head. Hopefully works in SQL2005 and on.

...and would be very much helped by an index on tablename(UserID, CreationDate)

Edited: Turns out Offset is a reserved word, so I used TheOffset instead.

Edited: The suggestion to use COUNT(*) is very valid - I should've done that in the first place but wasn't really thinking. Previously it was using datediff(day, min(CreationDate), max(CreationDate)) instead.

Rob