Introduction
Window functions in SQL. operate on a set of rows and return a single aggregated value for each row.
The main advantage of using Window functions over regular aggregate functions is:
“Window functions do not cause rows to become grouped into a single output row, the rows retain their separate identities and an aggregated value will be added to each row”.
Different Types of Window functions
- Aggregate Window Functions
SUM(), MAX(), MIN(), AVG(). COUNT() - Ranking Window Functions
RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE() - Value Window Functions
LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
General Syntax
window_function ( [ ALL ] expression )
OVER ( [ PARTITION BY partition_list ] [ ORDER BY order_list] )
Create Table for Data Analysis
CREATE TABLE faa.orders ( order_id INT, order_date DATE, customer_name VARCHAR(250), city VARCHAR(100), order_amount MONEY );
INSERT INTO faa.orders(order_id,order_date,customer_name,city,order_amount) values(1001,’04/01/2017′,’David Smith’,’GuildFord’,10000),
(1002,’04/02/2017′,’David Jones’,’Arlington’,20000),
(1003,’04/03/2017′,’John Smith’,’Shalford’,5000),
(1004,’04/04/2017′,’Michael Smith’,’GuildFord’,15000),
(1005,’04/05/2017′,’David Williams’,’Shalford’,7000),
(1006,’04/06/2017′,’Paum Smith’,’GuildFord’,25000),
(1007,’04/10/2017′,’Andrew Smith’,’Arlington’,15000),
(1008,’04/11/2017′,’David Brown’,’Arlington’,2000),
(1009,’04/20/2017′,’Robert Smith’,’Shalford’,1000),
(1010,’04/25/2017′,’Peter Smith’,’GuildFord’,500);
Aggregate Window Functions
SUM()
Lets find grand total of order amount using sum window function:-
SELECT
order_id, order_date, customer_name, city, order_amount ,SUM(order_amount)
OVER(PARTITION BY city) as grand_total FROM faa.Orders
AVG()
AVG or Average works in exactly the same way with a Window function.
The following query will give you average order amount for each city and for each month :-
SELECT order_id, order_date, customer_name, city, order_amount ,AVG(order_amount::money::numeric) OVER(PARTITION BY city) as average_order_amount FROM faa.Orders;
Ranking Window Functions
RANKING functions will rank the values of a specified field and categorize them according to their rank.
The most common use of RANKING functions is to find the top (N) records based on a certain value.
For example, Top 10 highest paid employees, Top 10 ranked students, Top 50 largest orders etc.
The following are supported RANKING functions:
RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
Let’s discuss them one by one.
RANK()
The RANK() function is used to give a unique rank to each record based on a specified value, for example salary, order amount etc.
If two records have the same value then the RANK() function will assign the same rank to both records by skipping the next rank. This means – if there are two identical values at rank 2, it will assign the same rank 2 to both records and then skip rank 3 and assign rank 4 to the next record.
Let’s rank each order by their order amount.
SELECT order_id,order_date,customer_name,city, RANK() OVER(ORDER BY order_amount DESC) Ranked FROM faa.Orders
DENSE_RANK()
The DENSE_RANK() function is identical to the RANK() function except that it does not skip any rank. This means that if two identical records are found then DENSE_RANK() will assign the same rank to both records but not skip then skip the next rank.
Let’s see how this works:-
SELECT order_id,order_date,customer_name,city, order_amount,
DENSE_RANK() OVER(ORDER BY order_amount DESC) DenseRank
FROM faa.Orders
NTILE()
NTILE() is a very helpful window function. It helps you to identify what percentile (or quartile, or any other subdivision) a given row falls into.
This means that if you have 100 rows and you want to create 4 quartiles based on a specified value field you can do so easily and see how many rows fall into each quartile.
Let’s see an example. In the query below, we have specified that we want to create four quartiles based on order amount. We then want to see how many orders fall into each quartile.
SELECT order_id,order_date,customer_name,city, order_amount, NTILE(4) OVER(ORDER BY order_amount) quartile FROM faa.Orders;
LAG() and LEAD()
LEAD() and LAG() functions are very powerful but can be complex to explain. The LAG function allows to access data from the previous row in the same result set without use of any SQL joins. You can see in below example, using LAG function we found previous order date.
Script to find previous order date using LAG() function:
SELECT order_id,customer_name,city, order_amount,order_date, LAG(order_date,1) OVER(ORDER BY order_date) prev_order_date FROM faa.Orders;
LEAD function allows to access data from the next row in the same result set without use of any SQL joins. You can see in below example, using LEAD function we found next order date.
Script to find next order date using LEAD() function:
SELECT order_id,customer_name,city, order_amount,order_date, LEAD(order_date,1) OVER(ORDER BY order_date) next_order_date FROM faa.Orders;