Window functions are a powerful feature in SQL used to perform calculations across a set of rows related to the current row. Unlike aggregate functions, window functions do not group rows into a single output; they return a result for each row while maintaining the context of the dataset.
In this article, we’ll explore some commonly used SQL window functions (ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LEAD()
, and LAG()
) with examples.
We’ll use the following Sales table to demonstrate window functions:
SalesID |
CustomerID |
Product |
Region |
Amount |
SaleDate |
---|---|---|---|---|---|
1 |
101 |
Laptop |
North |
1200 |
2023-01-05 |
2 |
102 |
Tablet |
North |
800 |
2023-02-15 |
3 |
103 |
Phone |
North |
800 |
2023-03-10 |
4 |
104 |
Tablet |
North |
500 |
2023-04-01 |
5 |
105 |
Laptop |
South |
1300 |
2023-05-05 |
6 |
106 |
Tablet |
South |
700 |
2023-06-20 |
7 |
107 |
Phone |
West |
900 |
2023-07-15 |
8 |
108 |
Laptop |
East |
1300 |
2023-08-10 |
The ROW_NUMBER() function assigns a unique number to each row within a partition, ordered by a specified column.
Task: Assign a unique row number to each sale within a region based on the sale amount (highest to lowest).
SELECT SalesID, Region, Amount,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Amount DESC) AS RowNum
FROM Sales;
Result:
SalesID |
Region |
Amount |
RowNum |
---|---|---|---|
1 |
North |
1200 |
1 |
2 |
North |
800 |
2 |
3 |
North |
800 |
3 |
4 |
North |
500 |
4 |
5 |
South |
1300 |
1 |
6 |
South |
700 |
2 |
7 |
West |
900 |
1 |
8 |
East |
1300 |
1 |
The RANK() function assigns a rank to each row within a partition. Rows with the same values receive the same rank, and the next rank is skipped.
Task: Rank sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS Rank
FROM Sales;
Result:
SalesID |
Region |
Amount |
Rank |
---|---|---|---|
1 |
North |
1200 |
1 |
2 |
North |
800 |
2 |
3 |
North |
800 |
2 |
4 |
North |
500 |
4 |
5 |
South |
1300 |
1 |
6 |
South |
700 |
2 |
7 |
West |
900 |
1 |
8 |
East |
1300 |
1 |
Key Feature:
The DENSE_RANK() function assigns ranks like RANK(), but it doesn’t skip ranks after ties.
Task: Assign dense ranks to sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
DENSE_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS DenseRank
FROM Sales;
Result:
SalesID |
Region |
Amount |
DenseRank |
---|---|---|---|
1 |
North |
1200 |
1 |
2 |
North |
800 |
2 |
3 |
North |
800 |
2 |
4 |
North |
500 |
3 |
5 |
South |
1300 |
1 |
6 |
South |
700 |
2 |
7 |
West |
900 |
1 |
8 |
East |
1300 |
1 |
Key Feature:
NTILE() divides rows into a specified number of approximately equal groups.
Task: Divide all sales into 4 groups based on Amount in descending order.
SELECT SalesID, Amount,
NTILE(4) OVER (ORDER BY Amount DESC) AS Quartile
FROM Sales;
Result:
SalesID |
Amount |
Quartile |
---|---|---|
5 |
1300 |
1 |
8 |
1300 |
1 |
1 |
1200 |
2 |
7 |
900 |
2 |
2 |
800 |
3 |
3 |
800 |
3 |
4 |
500 |
4 |
6 |
700 |
4 |
LEAD() retrieves the value from the next row within the same partition.
Task: Compare each sale amount to the next sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LEAD(Amount) OVER (ORDER BY SaleDate) AS NextAmount
FROM Sales;
Result:
SalesID |
Amount |
NextAmount |
---|---|---|
1 |
1200 |
800 |
2 |
800 |
800 |
3 |
800 |
500 |
4 |
500 |
1300 |
5 |
1300 |
700 |
6 |
700 |
900 |
7 |
900 |
1300 |
8 |
1300 |
NULL |
LAG()
retrieves the value from the previous row within the same partition.
Task: Compare each sale amount to the previous sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LAG(Amount) OVER (ORDER BY SaleDate) AS PrevAmount
FROM Sales;
Result:
SalesID |
Amount |
PrevAmount |
---|---|---|
1 |
1200 |
NULL |
2 |
800 |
1200 |
3 |
800 |
800 |
4 |
500 |
800 |
5 |
1300 |
500 |
6 |
700 |
1300 |
7 |
900 |
700 |
8 |
1300 |
900 |
SQL window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG() provide powerful ways to analyze data within partitions.
Key Takeaways:
ROW_NUMBER()
assigns a unique identifier for each row.RANK()
and DENSE_RANK()
differ in how they handle ties (skipping vs. no skipping).NTILE()
is useful for dividing rows into statistic groups.LEAD()
and LAG()
allow comparisons with adjacent rows.
By mastering these functions, you can handle complex analytics and ranking tasks effectively!
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement. If you find this information helpful, I invite you to follow me or connect with me on LinkedIn. Happy exploring!👋