Count distinct values with OVER(PARTITION BY id)

user007 picture user007 · Feb 12, 2014 · Viewed 32.9k times · Source

Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)? Currently my query is as follows:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(DISTINCT congestion.week_nb) OVER(
    PARTITION BY congestion.id_element
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

However, when I try to execute the query I get the following error:

"COUNT(DISTINCT": "DISTINCT is not implemented for window functions"

Answer

Simo Kivist&#246; picture Simo Kivistö · Feb 21, 2015

No, as the error message states, DISTINCT is not implemented with windows functions. Aplying info from this link into your case you could use something like:

WITH uniques AS (
 SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references
 FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
 GROUP BY congestion.id_element
)

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
uniques.unique_references AS week_count
FROM congestion
JOIN uniques USING (id_element)
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

Depending on the situation you could also put a subquery straight into SELECT-list:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
(SELECT COUNT(DISTINCT dist_con.week_nb)
    FROM congestion AS dist_con
    WHERE dist_con.date >= '2014.01.01'
    AND dist_con.date <= '2014.12.31'
    AND dist_con.id_element = congestion.id_element) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date