Difference between row_number(), rank(), and dense_rank()

Found a great article that reviews the differences between these 3 useful window functions. I've adapted the theoretical data to provide a more practical (albeit simplified) business context.

All coding credit goes to the authors of the article. Highly recommend checking it out for a more comprehensive explanation of each function!

Suppose you have a list of customers and their respective monthly purchases, like so:

customer_id    mon_purch
     1            200
     2            150
     3            200
     4            300
     5            100
     6            400
     7            200
     8            300

You've been tasked with ranking these customers so that Marketing can prioritize their outreach effectively (e.g. varying offers for future purchases based on last month's purchases).

You could simply write a query to order by mon_purch desc, customer_id and send it to Marketing:

customer_id    mon_purch
     6            400
     4            300
     8            300
     1            200
     3            200
     7            200
     2            150
     5            100

But that solution is lacking for a few reasons.

Firstly, Marketing doesn't have the ability to slice the output based on condition (e.g. 'Let's send an offer to our 20th-50th best customers'). 

Secondly, as seen above, some customers have made identical purchases last month. You might need a way to handle ties.

Enter row_number(), rank(), and dense_rank()!

Here's how you could rank your customers.


  , mon_purch
  , row_number() over (mon_purch desc, customer_id)
  , rank()       over (mon_purch desc, customer_id)
  , dense_rank() over (mon_purch desc, customer_id)
from customer_purchases
order by 2 desc, 3


 customer_id  mon_purch  row_number  rank  dense_rank
      6          400         1        1        1
      4          300         2        2        2
      8          300         3        2        2
      1          200         4        4        3
      3          200         5        4        3
      7          200         6        4        3
      2          150         7        7        4
      5          100         8        8        5

Using row_number():

  • Marketing does not consider ties. In fact, assuming your customer IDs are assigned in ascending chronological order, this suggests that Marketing actually favours older customers.
    • Tip: if you wanted to favour younger customers, add , 1 desc to your order by clause

Using rank():

  • Marketing now treats ties as equal but ranks will be skipped with each change in mon_purch (e.g. there is no rank 3 in the table above). The final rank will be equal to the number of customers in the table.
    • Caution: the distribution of data can affect this method. For example, a set where 1 customer spent $400, 13 customers spent $375, and 1 spent $350 would have ranks 1,2, and 15, respectively. You'd have to consider if the customer ranked 15 is actually significantly different from customers ranked 2 when the difference between 2 and 15 ($25) is the same as the difference between 1 and 2 ($25).

Using dense_rank():

  • Marketing treats ties as equal but ranks are not skipped with each change in mon_purch. The final rank will depend on the size of the data set and the number of tie breaks.
    • This method generally works better with tighter distributions. Always a good idea to explore the data visually first before choosing a ranking method!

So there you have it! Of course, this is just a simplified example. There are many other factors and methods to consider when ranking customers.

Other factors might include customer age, location, and order history.

Other methods for prioritization might include ntile() for binning or lag() to calculate percentage change with a case statement to assign custom rankings based on the outcome. Perhaps in a future post ๐Ÿ™‚ .

Happy coding!

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like Follow
  • 1 yr agoLast active
  • 1834Views
  • 1 Following