Using SQL DENSE_RANK to determine duplicates

user320587 picture user320587 · Jul 15, 2011 · Viewed 9.9k times · Source

Here is an example of the data for which I am trying to find all the orders with same quantities, ignoring the OrderID column

Product     Location      Customer    OrderID    Quantity
 Eggs        Chicago        XYZ        2011        10
 Eggs        Chicago        XYZ        2012        10
 Eggs        Chicago        XYZ        2013        15

So, I used DENSE_RANK function in the SQL

Select Product,Location,Customer,OrderID,Quantity,
Ranking = DENSE_RANK() OVER (PARTITION BY Product,Location,Customer,Quantity
ORDER BY OrderID ASC)
FROM MyTable

to get the data below

Product     Location      Customer    OrderID    Quantity Ranking
 Eggs        Chicago        XYZ        2011        10       1
 Eggs        Chicago        XYZ        2012        10       2
 Eggs        Chicago        XYZ        2013        15       1

So, based on the ranking I was able to filter out the records that have the same quantity across different orderIDs and treat them as one. So far everything is good and I am happy. But, one of another crazy requirement is this form of aggregation should be done only for the first change in quantity. For example, if the above data happens to be like one below

Product     Location      Customer    OrderID    Quantity
 Eggs        Chicago        XYZ        2011        10
 Eggs        Chicago        XYZ        2012        10
 Eggs        Chicago        XYZ        2013        15
 Eggs        Chicago        XYZ        2014        15
 Eggs        Chicago        XYZ        2015        15

The same SQL would produce result

Product     Location      Customer    OrderID    Quantity Ranking
 Eggs        Chicago        XYZ        2011        10       1
 Eggs        Chicago        XYZ        2012        10       2
 Eggs        Chicago        XYZ        2013        15       1
 Eggs        Chicago        XYZ        2013        15       2
 Eggs        Chicago        XYZ        2013        15       3

But, I would need the result to be

Product     Location      Customer    OrderID    Quantity Ranking
 Eggs        Chicago        XYZ        2011        10       1
 Eggs        Chicago        XYZ        2012        10       2
 Eggs        Chicago        XYZ        2013        15       1
 Eggs        Chicago        XYZ        2013        15       1
 Eggs        Chicago        XYZ        2013        15       1

Please, note the ranking remains 1 for all the records after the first change in quantity.

Is it possible to tweak my SQL to get the above behavior?

Thanks for any suggestions.

Answer

FistOfFury picture FistOfFury · May 23, 2013

If I understand you correctly, you want to use DENSE_RANK() to eliminate duplicate rows in your data.

It seems you’ve already solved your problem. If you want to eliminate the duplicates, use the same SQL code you have above and delete any rows with Ranking > 1. This will leave you with one copy of each row with the same unique key (e.g. Product, Location, Customer, OrderID).