Cumulative distinct count

user1495744 picture user1495744 · Nov 9, 2016 · Viewed 8.5k times · Source

I am working on query to get cumulative distinct count of uids on daily basis.

Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i want store cumulative count to be 3 not 5 as (user id 100 and 200 already appeared on past day ).

Input table:

    date            uid         
2016-11-01          100
2016-11-01          200
2016-11-01          300
2016-11-01          400         
2016-11-02          100
2016-11-02          200                 
2016-11-03          300
2016-11-03          400
2016-11-03          500
2016-11-03          600
2016-11-04          700

Expected query result:

date            daily_cumulative_count
2016-11-01              4   
2016-11-02              4
2016-11-03              6
2016-11-04              7

Till now i am able to get cumulative distinct count per day but it includes previous distinct uids from previous day as well.

SELECT 
  date, 
  SUM(count) OVER (
    ORDER BY date ASC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  )
FROM (
  SELECT 
    date, 
    COUNT(DISTINCT uid) AS count
  FROM sample_table
  GROUP by 1
)ORDER BY date DESC;

Any kind of help would be greatly appreciated.

Answer

Stepan BLR picture Stepan BLR · May 15, 2018

easiest way:

SELECT *, count(*) over (order by fst_date ) cum_uids
  FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
 ) t

or something like this