What Are SQL Window Functions?
Table of Contents
- A Brief History of SQL Window Functions
- What’s a Window?
- SQL Window Function Syntax
- Dataset and Syntax Examples
- What Are the Most Common Window Functions?
- Real-World SQL Window Function Examples
- SQL Window Functions: A Window to Better Data Analysis
Need to up your data analysis game? Learn these SQL window functions here and you’ll take your analysis skills to the next level. Practice exercises and detailed explanations included!
When I first heard about SQL window functions, I thought it was some strange marriage between SQL and Windows. I was wrong. The SQL window functions have nothing to do with that famous operating system. They are SQL functions that do computations on the set of rows related to the current row. This set of rows is called a window or window frame – hence the function name.
You might also hear about windowing functions in SQL, analytical functions, or the OVER() functions. These are all just alternative names for SQL window functions – an extremely helpful set of tools for data analysis.
In this article, we’ll show you what you can do with window functions and how. I’ll start with a short history of SQL window functions and explain why they are called that. Then, I’ll guide you through the syntax and show you how it works with several examples. After practicing syntax, we’ll be ready for real-life window function examples from a data analyst’s life. And here’s the juiciest part: after each example, there’s an exercise for you to solve and learn through coding.
However, the main source of your knowledge on this subject should be our Window Functions course. Its 218 interactive exercises cover SQL window functions in detail. In other words, you’ll learn about window frames and the OVER(), PARTITION BY, and ORDER BY clauses. All this is necessary to aggregate, rank, and analyze data using window functions.
A Brief History of SQL Window Functions
Window functions were first introduced in the Oracle8i Database, which was released in 1998. However, they were included in the SQL standard five years later with SQL:2003.
Then Microsoft included them in SQL Server 2005. Other database management systems (DBMS) followed; PostgreSQL has supported them since PostgreSQL 8.4 was released in 2009; MariaDB included them with the 10.2 version (2016), and MySQL added them to version 8 in 2018.
Window functions are a rather new feature in SQL. Because of that, they are not part of the usual SQL curriculum. By learning them, you’ll be ahead of the curve compared to many SQL users.
What’s a Window?
A set of rows related to the current row is called a window or a window frame. Hence, the name of these functions: their result is based on a sliding window frame.
For example, you can calculate a cumulative sum as shown below:
| date | sales | cumulative_sum |
|---|---|---|
| 2023-10-01 | 4,241 | 4,241 |
| 2023-10-02 | 2,389 | 6,630 |
| 2023-10-03 | 1,580 | 8,210 |
| 2023-10-04 | 3,395 | 11,605 |
| 2023-10-05 | 1,265 | 12,870 |
The window for the 2023-10-04 cumulative sum is highlighted in green. It includes the current row (for 2023-10-04) and all the previous rows. So the cumulative sum is calculated as the sum of all the previous and current sales: 4,241 + 2,389 + 1,580 + 3,395 = 11,605. (Note that the row outlined in red dots is not included in the window or the sum.)
When we move to the next row, the window also moves: it will now include all the previous rows (green) and the current row (red dotted). Now the cumulative sum is 4,241 + 2,389 + 1,580 + 3,395 + 1,265 = 12,870.
So the window is the set of rows related to the current row that are used in computations for this row. The window changes (slides) as we move across the rows; thanks to these images of a sliding window, we get the name of these functions.
SQL Window Function Syntax
The syntax for window functions is:
SELECT column_1,
column_2,
<window_function> OVER(PARTITION BY … ORDER BY … ) AS column_alias
FROM table;
Here’s what each part does:
<window_function>– Specifies the function to apply to that window.OVER()– Defines the window (set of rows) and indicates that this is a window function; without this clause, it’s not a window function.<window_frame>– Defines the window frame size (optional).PARTITION BY– Divides the window into smaller groups called partitions (optional); if omitted, the whole result set is one partition.- ORDER BY – Sorts rows within the window frame (optional), i.e., decides in which order the window operation will be performed; if omitted, the order of rows within the partition is arbitrary.
Additional clauses can further define the window. Their syntax is:
[<ROWS or RANGE clause> BETWEEN <lower_bound> AND <upper_bound>]
The ROWS clause defines the window in terms of the fixed number of rows in relation to the current row.
The RANGE clause does the same. But it also takes into calculation all the rows with the same values in the columns specified in the ORDER BY clause as the current row.
The window bounds can be defined as
UNBOUNDED PRECEDING– All the rows before the current row.n PRECEDING– A defined number of rows before the current row.CURRENT ROW– Includes the current row.n FOLLOWING– A defined number of rows after the current row.UNBOUNDED FOLLOWING– All the rows after the current row.
Let’s now see how this works in practice.
Dataset and Syntax Examples
We’ll use the table album_catalogue in all these examples. You can create it yourself using this script. A data snapshot is shown below:
| id | album_title | album_length | album_genre | artist | copies_sold | sales_period |
|---|---|---|---|---|---|---|
| 1 | Wednesday Morning, 3 A.M | 0:31:38 | Folk | Simon & Garfunkel | 1043 | 2022_1Q |
| 2 | EnRoute: John Scofield Trio LIVE | 1:13:48 | Jazz | John Scofield Trio | 512 | 2022_1Q |
| 3 | Nasty Gal | 0:39:15 | Funk | Betty Davis | 809 | 2022_1Q |
| 4 | The New Folk Sound of Terry Callier | 0:37:41 | Folk | Terry Callier | 903 | 2022_1Q |
| 5 | In a Silent Way | 0:38:08 | Jazz | Miles Davis | 428 | 2022_1Q |
The dataset is a list of albums with their length, genre, artist, and sales data, including the number of copies sold and the period (quarters). The data goes all the way to the third quarter of 2023.
I’ll first show you several examples, explaining each crucial part of the windowing functions’ syntax along the way.
Syntax Example #1: OVER ()
You can use the SUM() window function with only the OVER() clause to get the total sales in the fourth quarter of 2022:
SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER() AS sold_in_4Q_2022 FROM album_catalogue WHERE sales_period = '2022_4Q';
I want the sum of the copies sold, so I specify this column in SUM(). The OVER() clause is mandatory. If you want to use OVER() without any of the optional clauses, just leave the parentheses empty.
I use WHERE to output only data from the desired quarter.
When you write a query like this – with an empty OVER() – the whole result set (selected columns, applied filters, etc.) is taken into account when performing the window function calculations. Here, the result shows the individual sales of every album sold in the fourth quarter of 2022. It also shows the total sales of all albums sold in that period.
| sales_period | album_title | artist | copies_sold | sold_in_4q_2022 |
|---|---|---|---|---|
| 2022_4Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 809 | 7,403 |
| 2022_4Q | EnRoute: John Scofield Trio LIVE | John Scofield Trio | 612 | 7,403 |
| 2022_4Q | Nasty Gal | Betty Davis | 369 | 7,403 |
| 2022_4Q | The New Folk Sound of Terry Callier | Terry Callier | 214 | 7,403 |
| 2022_4Q | In a Silent Way | Miles Davis | 65 | 7,403 |
| 2022_4Q | Cold Sweat | James Brown | 209 | 7,403 |
| 2022_4Q | The Freewheelin' Bob Dylan | Bob Dylan | 246 | 7,403 |
| 2022_4Q | My Favorite Things | John Coltrane | 377 | 7,403 |
| 2022_4Q | A Whole New Thing | Sly and the Family Stone | 816 | 7,403 |
| 2022_4Q | Five Leaves Left | Nick Drake | 400 | 7,403 |
| 2022_4Q | Head Hunters | Herbie Hancock | 409 | 7,403 |
| 2022_4Q | In the Right Place | Dr. John | 912 | 7,403 |
| 2022_4Q | Blue | Joni Mitchell | 412 | 7,403 |
| 2022_4Q | Concierto | Jim Hall | 612 | 7,403 |
| 2022_4Q | Dirty Mind | Prince | 941 | 7,403 |
With the help of SUM() and OVER(), I’m able to show each individual album’s sales and quarterly total.
Syntax Example #2: OVER (ORDER BY)
You can add additional clauses inside the OVER() clause to change the definition of the window frame. One such clause is ORDER BY. The ORDER BY clause defines the sorting of rows within a window frame: the rows can be processed by the window function in a given order.
Let’s see an example. You can calculate the cumulative sum by adding ORDER BY to the previous query. In this example, I want to see how the album ‘In the Right Place’ sells over time and the cumulative number of albums sold up to a given period of time. Here’s the query:
SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER(ORDER BY sales_period ASC) AS cumulative_sum FROM album_catalogue WHERE album_title = 'In the Right Place';
The same column copies_sold is in SUM(). This time, OVER() contains the ORDER BY clause. You want to show the cumulative sales from the earliest to the latest quarter. That’s why you need sales_period and ASC in ORDER BY.
| sales_period | album_title | artist | copies_sold | cumulative_sum |
|---|---|---|---|---|
| 2022_1Q | In the Right Place | Dr. John | 222 | 222 |
| 2022_2Q | In the Right Place | Dr. John | 208 | 430 |
| 2022_3Q | In the Right Place | Dr. John | 94 | 524 |
| 2022_4Q | In the Right Place | Dr. John | 912 | 1436 |
| 2023_1Q | In the Right Place | Dr. John | 912 | 2348 |
| 2023_2Q | In the Right Place | Dr. John | 56 | 2404 |
| 2023_3Q | In the Right Place | Dr. John | 562 | 2966 |
In each row, you can see the sales for each quarter and the cumulative sum, i.e., the sum of the current and all the previous quarters. For instance, the album sold 94 copies in the third quarter of 2022. The total sales in 2022 up until then (or in three quarters) is: 222 + 208 + 94 = 524.
Syntax Example #3: OVER (PARTITION BY)
Another clause you can use in OVER() is PARTITION BY. PARTITION BY is used to divide the window into smaller segments based on some criteria. For example, you can list the albums, their sales data for the fourth quarter of 2022, and the sales by genre for that quarter:
SELECT album_title, artist, copies_sold, album_genre, SUM (copies_sold) OVER(PARTITION BY album_genre) AS sales_by_genre FROM album_catalogue WHERE sales_period = '2022_4Q';
Again, we use the same window function SUM(). This time, though, we use PARTITION BY to divide the window into smaller segments based on album genre. Everything else stays the same.
The query returns the result below. It’s an analysis of album sales data by genre for the last quarter of 2022.
| album_title | artist | copies_sold | album_genre | sales_by_genre |
|---|---|---|---|---|
| Wednesday Morning, 3 A.M | Simon & Garfunkel | 809 | Folk | 2,081 |
| The Freewheelin' Bob Dylan | Bob Dylan | 246 | Folk | 2,081 |
| Five Leaves Left | Nick Drake | 400 | Folk | 2,081 |
| The New Folk Sound of Terry Callier | Terry Callier | 214 | Folk | 2,081 |
| Blue | Joni Mitchell | 412 | Folk | 2,081 |
| Dirty Mind | Prince | 941 | Funk | 3,247 |
| Nasty Gal | Betty Davis | 369 | Funk | 3,247 |
| Cold Sweat | James Brown | 209 | Funk | 3,247 |
| A Whole New Thing | Sly and the Family Stone | 816 | Funk | 3,247 |
| In the Right Place | Dr. John | 912 | Funk | 3,247 |
| Head Hunters | Herbie Hancock | 409 | Jazz | 2,075 |
| EnRoute: John Scofield Trio LIVE | John Scofield Trio | 612 | Jazz | 2,075 |
| In a Silent Way | Miles Davis | 65 | Jazz | 2,075 |
| Concierto | Jim Hall | 612 | Jazz | 2,075 |
| My Favorite Things | John Coltrane | 377 | Jazz | 2,075 |
For instance, the cumulative sum for the folk albums is 809 + 246 + 400 + 214 + 412 = 2,081.
Syntax Example #4: OVER (ORDER BY PARTITION BY)
You can also use both PARTITION BY and ORDER BY in OVER(). The rows are divided into segments with PARTITION BY and processed in a given order by ORDER BY.
Using the query below, I can show all the album analytical data and calculate the cumulative sum for each album separately:
SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER(PARTITION BY album_title ORDER BY sales_period ASC) AS cumulative_sum_by_album FROM album_catalogue;
I calculate this with the help of the SUM() window function, as I did earlier. I partitioned the window by album. This means that the sum will be cumulated until the function reaches the last row of a particular album. When it reaches another album, it resets and starts accumulating the sum from the beginning.
I also use ORDER BY to instruct the function to cumulate the sum from the earliest to the latest quarter.
| sales_period | album_title | artist | copies_sold | cumulative_sum_by_album |
|---|---|---|---|---|
| 2022_1Q | A Whole New Thing | Sly and the Family Stone | 674 | 674 |
| 2022_2Q | A Whole New Thing | Sly and the Family Stone | 257 | 931 |
| 2022_3Q | A Whole New Thing | Sly and the Family Stone | 666 | 1,597 |
| 2022_4Q | A Whole New Thing | Sly and the Family Stone | 816 | 2,413 |
| 2023_1Q | A Whole New Thing | Sly and the Family Stone | 816 | 3,229 |
| 2023_2Q | A Whole New Thing | Sly and the Family Stone | 302 | 3,531 |
| 2023_3Q | A Whole New Thing | Sly and the Family Stone | 123 | 3,654 |
| 2022_1Q | Blue | Joni Mitchell | 589 | 589 |
| 2022_2Q | Blue | Joni Mitchell | 184 | 773 |
| 2022_3Q | Blue | Joni Mitchell | 256 | 1,029 |
| 2022_4Q | Blue | Joni Mitchell | 412 | 1,441 |
| 2023_1Q | Blue | Joni Mitchell | 412 | 1,853 |
| 2023_2Q | Blue | Joni Mitchell | 99 | 1,952 |
| 2023_3Q | Blue | Joni Mitchell | 995 | 2,947 |
| … | … | … | … | … |
| 2022_1Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 1,043 | 1,043 |
| 2022_2Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 437 | 1,480 |
| 2022_3Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 184 | 1,664 |
| 2022_4Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 809 | 2,473 |
| 2023_1Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 809 | 3,282 |
| 2023_2Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 325 | 3,607 |
| 2023_3Q | Wednesday Morning, 3 A.M | Simon & Garfunkel | 612 | 4,219 |
You can see that the cumulative sum for ‘A Whole New Thing’ is 3,654. The next album (‘Blue’) starts with the next row, so the cumulation is reset: the cumulative sum is the same as the individual sales of the album in the first quarter of 2022. Then, it accumulates until it reaches the next album. The result goes all the way until the last album, which is ‘Wednesday Morning, 3 A.M.’ in our case.
I’ve shown you the most common ways of defining window frames with these examples. But these are not the only ways. You can also use the ROW or RANGE clauses with the syntax and bounds we explained earlier. Don’t worry. You’ll see the practical use of this in real-world examples.
I used only one function, SUM(), throughout these examples. This is one of many window functions; let’s quickly review some of the others.
What Are the Most Common Window Functions?
The most common window functions can be divided into three categories:
- Aggregate Window Functions:
COUNT()– Counts the number of rows within a window.SUM()– Totals given values within a window.AVG()– Calculate the average of given values within a window.MIN()– Finds the smallest value within a window.MAX()– Finds the largest value within a window.
- Ranking Window Functions:
ROW_NUMBER()– Ranks values sequentially, with different ranks for the tied values.RANK()– Ranks values using the same rank for tied values; skips the next rank after the ties (e.g. 1, 2, 2, 4).DENSE_RANK()– Ranks values using the same rank for tied values; doesn’t skip the next rank after the ties (e.g. 1,2,2,3,4).
- Analytic Window Functions:
LEAD()– Gets data from a defined offset (i.e. a stated number of rows) after the current row.LAG()– Gets data from a defined offset (i.e. a stated number of rows) before the current row.
There are more window functions you could find useful. Check them out in our free SQL Window Functions Cheat Sheet.
Real-World SQL Window Function Examples
So far, I’ve focused more on the syntax of SQL window functions. I’ll now show you the most common practical uses of window functions and how they can help data analysts in their work.
These examples will use the same dataset as earlier.
Example #1: Percentage of Total
Let’s show info about each album and its sales in the first quarter of 2023. Additionally, we will show quarterly sales by each genre. Then, let’s calculate how much each album (as a percentage) contributes to genre sales.
SELECT album_title, artist, copies_sold, album_genre, SUM(copies_sold) OVER (PARTITION BY album_genre) AS sales_by_genre, (copies_sold*1.0/ SUM(copies_sold) OVER (PARTITION BY album_genre))*100.0 AS percent_of_genre_sales FROM album_catalogue WHERE sales_period = '2023_1Q' ORDER BY album_genre, copies_sold DESC;
To get the sales by genre, I again use the SUM() window function. In the OVER() clause, I use only PARTITION BY. That way, I can partition the window by the album genre.
In the next code line, I divide the copies sold (of each album) and divide by the genre sales. To do that, simply copy the calculation from the previous line. Then, multiply the quotient by 100 to get the percentage. You’ll notice that I also multiplied copies_sold with 1.0. This is for converting integers to decimal values.
Filter the desired quarter using WHERE. Finally, order the output alphabetically by genre and then descendingly by copies sold.
Here’s the result:
| album_title | artist | copies_sold | album_genre | sales_by_genre | percent_of_genre_sales |
|---|---|---|---|---|---|
| Wednesday Morning, 3 A.M | Simon & Garfunkel | 809 | Folk | 2,081 | 38.88 |
| Blue | Joni Mitchell | 412 | Folk | 2,081 | 19.80 |
| Five Leaves Left | Nick Drake | 400 | Folk | 2,081 | 19.22 |
| The Freewheelin' Bob Dylan | Bob Dylan | 246 | Folk | 2,081 | 11.82 |
| The New Folk Sound of Terry Callier | Terry Callier | 214 | Folk | 2,081 | 10.28 |
| Dirty Mind | Prince | 941 | Funk | 3,247 | 28.98 |
| In the Right Place | Dr. John | 912 | Funk | 3,247 | 28.09 |
| A Whole New Thing | Sly and the Family Stone | 816 | Funk | 3,247 | 25.13 |
| Nasty Gal | Betty Davis | 369 | Funk | 3,247 | 11.36 |
| Cold Sweat | James Brown | 209 | Funk | 3,247 | 6.44 |
| EnRoute: John Scofield Trio LIVE | John Scofield Trio | 612 | Jazz | 2,075 | 29.49 |
| Concierto | Jim Hall | 612 | Jazz | 2,075 | 29.49 |
| Head Hunters | Herbie Hancock | 409 | Jazz | 2,075 | 19.71 |
| My Favorite Things | John Coltrane | 377 | Jazz | 2,075 | 18.17 |
| In a Silent Way | Miles Davis | 65 | Jazz | 2,075 | 3.13 |
Let’s check the calculation for the first row. The album by Simon & Garfunkel sold 809 copies. The total folk album sales for that quarter were 2,081. So the percentage of the individual sales in total genre sales is 809/2,081*100 = 38.88%.
The sum of percentages for each genre should be 100%. Let’s check this on a folk genre: 38.88% + 19.80% + 19.22% + 11.82% + 10.28 % = 100%.
Window Function vs. Aggregate Function vs. GROUP BY
I’m again using the aggregate function as a window function. I could’ve used a simple aggregate SUM() function with GROUP BY to get sales by each genre for the specified quarter. What’s the difference, then?
A window function allows you to show both analytical and aggregate data (individual sales with sales by genre and the quotient of these values), while an aggregate function used with GROUP BY would collapse the individual row and show only the aggregate value (the sum of sales for the quarter).
Solve This Exercise for Practice
Using window functions, rewrite the above query so that it shows the average sale by genre. Also, show how much each album's sales are above or below the genre average (as a percentage). Show only sales from the third quarter of 2023. Show the album title, artist, copies sold, and the album genre. Sort the output ascendingly by genre and individual album sales.
Solution:
SELECT album_title, artist, copies_sold, album_genre, AVG(copies_sold) OVER (PARTITION BY album_genre) AS average_sales_by_genre, ((copies_sold/AVG(copies_sold) OVER (PARTITION BY album_genre))-1)*100 AS pct_from_average FROM album_catalogue WHERE sales_period = '2023_3Q' ORDER BY album_genre, copies_sold;
Output:
| album_title | artist | copies_sold | album_genre | average_sales_by_genre | pct_from_average |
|---|---|---|---|---|---|
| The New Folk Sound of Terry Callier | Terry Callier | 283 | Folk | 561.6 | -49.61 |
| Five Leaves Left | Nick Drake | 321 | Folk | 561.6 | -42.84 |
| The Freewheelin' Bob Dylan | Bob Dylan | 597 | Folk | 561.6 | 6.30 |
| Wednesday Morning, 3 A.M | Simon & Garfunkel | 612 | Folk | 561.6 | 8.97 |
| Blue | Joni Mitchell | 995 | Folk | 561.6 | 77.17 |
| A Whole New Thing | Sly and the Family Stone | 123 | Funk | 533.4 | -76.94 |
| Dirty Mind | Prince | 169 | Funk | 533.4 | -68.32 |
| In the Right Place | Dr. John | 562 | Funk | 533.4 | 5.36 |
| Nasty Gal | Betty Davis | 808 | Funk | 533.4 | 51.48 |
| Cold Sweat | James Brown | 1005 | Funk | 533.4 | 88.41 |
| Concierto | Jim Hall | 263 | Jazz | 464 | -43.32 |
| My Favorite Things | John Coltrane | 302 | Jazz | 464 | -34.91 |
| EnRoute: John Scofield Trio LIVE | John Scofield Trio | 404 | Jazz | 464 | -12.93 |
| Head Hunters | Herbie Hancock | 542 | Jazz | 464 | 16.81 |
| In a Silent Way | Miles Davis | 809 | Jazz | 464 | 74.35 |
Example #2: Rank Data
In this example, I’ll use a window function to rank data. I want to show each distinct album title and its length and rank them by length. The longest album will be ranked first.
SELECT *, RANK() OVER (ORDER BY album_length DESC) AS album_length_rank FROM (SELECT DISTINCT album_title, album_length FROM album_catalogue) AS distinct_album;
Let’s start by explaining the subquery: we use it to select distinct albums and their lengths.
Then, we use the main query to select all the data from the subquery. Now, use the RANK() window function to rank albums. You can also use other ranking functions, depending on your data and tasks.
For the ranking to work the way you want to, use the ORDER BY clause in OVER(). Specify the column by which you want to rank and in what order. In this case, it’s descendingly by length.
Here’s the ranking:
| album_title | album_length | album_length_rank |
|---|---|---|
| EnRoute: John Scofield Trio LIVE | 1:13:48 | 1 |
| The Freewheelin' Bob Dylan | 0:44:14 | 2 |
| Head Hunters | 0:41:52 | 3 |
| Five Leaves Left | 0:41:43 | 4 |
| My Favorite Things | 0:40:25 | 5 |
| Nasty Gal | 0:39:15 | 6 |
| In a Silent Way | 0:38:08 | 7 |
| Concierto | 0:38:02 | 8 |
| A Whole New Thing | 0:38:01 | 9 |
| The New Folk Sound of Terry Callier | 0:37:41 | 10 |
| Blue | 0:36:15 | 11 |
| Cold Sweat | 0:33:43 | 12 |
| In the Right Place | 0:33:22 | 13 |
| Wednesday Morning, 3 A.M | 0:31:38 | 14 |
| Dirty Mind | 0:30:14 | 15 |
Solve This Exercise for Practice
Rank each unique album by its sales within its genre. Show only data for the first quarter of 2023. Show the album title, its sales, genre, and rank. If there are albums with the same number of sales, rank them equally and don't skip the next rank.
Solution:
SELECT *, DENSE_RANK() OVER (PARTITION BY album_genre ORDER BY copies_sold DESC) AS album_sales_rank FROM (SELECT DISTINCT album_title, copies_sold, album_genre FROM album_catalogue WHERE sales_period = '2023_1Q') AS distinct_album;
Output:
| album_title | copies_sold | album_genre | album_sales_rank |
|---|---|---|---|
| Wednesday Morning, 3 A.M | 809 | Folk | 1 |
| Blue | 412 | Folk | 2 |
| Five Leaves Left | 400 | Folk | 3 |
| The Freewheelin' Bob Dylan | 246 | Folk | 4 |
| The New Folk Sound of Terry Callier | 214 | Folk | 5 |
| Dirty Mind | 941 | Funk | 1 |
| In the Right Place | 912 | Funk | 2 |
| A Whole New Thing | 816 | Funk | 3 |
| Nasty Gal | 369 | Funk | 4 |
| Cold Sweat | 209 | Funk | 5 |
| EnRoute: John Scofield Trio LIVE | 612 | Jazz | 1 |
| Concierto | 612 | Jazz | 1 |
| Head Hunters | 409 | Jazz | 2 |
| My Favorite Things | 377 | Jazz | 3 |
| In a Silent Way | 65 | Jazz | 4 |
Example #3: Running Total
In this example, I’ll show a particular album’s sales period, title, artist, and the copies sold. I’ll also add a running total of copies sold that will include three rows: the current row and the two previous. The sum should be calculated from the earliest to the latest quarter.
SELECT sales_period, album_title, artist, copies_sold, SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sales_running_total FROM album_catalogue WHERE album_title = 'In a Silent Way';
I again use the SUM() window function. Then there’s an ORDER BY in OVER() to order the sales so we can sum them ascendingly.
Next, I need to define the moving window frame. The running total should include the current row and the previous two rows. These are the lower and upper bounds specified in the ROWS clause. The lower bound is two previous rows, i.e., BETWEEN 2 PRECEDING. The upper bound is CURRENT ROW. The two bounds are stitched together in a window frame using the keyword AND.
I want to show the calculation for Miles Davis’s album ‘In a Silent Way’, so I filter data using WHERE.
Here are the running totals:
| sales_period | album_title | artist | copies_sold | sales_running_total |
|---|---|---|---|---|
| 2022_1Q | In a Silent Way | Miles Davis | 428 | 428 |
| 2022_2Q | In a Silent Way | Miles Davis | 1,053 | 1,481 |
| 2022_3Q | In a Silent Way | Miles Davis | 19 | 1,500 |
| 2022_4Q | In a Silent Way | Miles Davis | 65 | 1,137 |
| 2023_1Q | In a Silent Way | Miles Davis | 65 | 149 |
| 2023_2Q | In a Silent Way | Miles Davis | 218 | 348 |
| 2023_3Q | In a Silent Way | Miles Davis | 809 | 1,092 |
Let’s check the result and explain what a running total is.
A running total is similar to a cumulative total (or sum), but they’re not the same. The cumulative total will give you the sum of the current row and all the previous rows, i.e., the window frame increases with each row. A running total is a sum within a defined window frame that stays the same size but moves with each row. In our case, the window is defined as the current row and the two previous rows.
Take a look at the highlighted values. The running total for the first quarter in 2022 is 428, the same as the individual sale. There are no previous rows, so the running total includes only the current row.
The next running total is 428 + 1,053 = 1,481. It sums the current and the previous row, as there is only one previous row.
The running total for the third quarter in 2022 is 428 + 1,053 + 19 = 1,500. This is the first time you get the whole window, i.e., the current row and the two previous rows.
As you go to the next row, the window will move but its size will remain the same. The running total for the following quarter is 428 + 1,053 + 19 + 65 = 1,137. It, again, involves the current row and the two previous rows – but different ones compared to the quarter.
Solve this exercise for practice
Rewrite the above query so it calculates the running total for the album ‘The New Folk Sound of Terry Callier’. The running total should be calculated from the earliest to the latest quarter. It should include four quarters: the two previous, the current quarter, and the one following. Also, show the sales period, album title, artist, and the number of copies sold.
Solution:
SELECT sales_period, album_title, artist, copies_sold, SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS sales_running_total FROM album_catalogue WHERE album_title = 'The New Folk Sound of Terry Callier';
Output:
| sales_period | album_title | artist | copies_sold | sales_running_total |
|---|---|---|---|---|
| 2022_1Q | The New Folk Sound of Terry Callier | Terry Callier | 903 | 2,575 |
| 2022_2Q | The New Folk Sound of Terry Callier | Terry Callier | 418 | 2,789 |
| 2022_3Q | The New Folk Sound of Terry Callier | Terry Callier | 1,254 | 3,003 |
| 2022_4Q | The New Folk Sound of Terry Callier | Terry Callier | 214 | 2,641 |
| 2023_1Q | The New Folk Sound of Terry Callier | Terry Callier | 214 | 2,506 |
| 2023_2Q | The New Folk Sound of Terry Callier | Terry Callier | 541 | 1,252 |
| 2023_3Q | The New Folk Sound of Terry Callier | Terry Callier | 283 | 1,038 |
Example #4: Quarter-To-Quarter Difference
In this last example, I’ll show how to use the window functions to calculate the sales difference between quarters:
SELECT *, LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS previous_quarter_sales, quarterly_copies_sold - LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS quarterly_sales_difference FROM (SELECT sales_period, SUM(copies_sold) AS quarterly_copies_sold FROM album_catalogue GROUP BY sales_period) AS quarterly_sales;
I first write a subquery that calculates the total sales for each quarter. I use the SUM() aggregate function and group the results by the sales period.
Next, I select all the data from the subquery in the main query.
Now I need to get the previous quarter's sales. I’ll write the LAG() window function, which is used to access values from the previous rows. The value I want to access is specified in the function. In this case, it’s the quarterly copies sold from the subquery. By defining the offset argument, the function allows me to define how far back I want to go. I didn’t define it, so the default offset is one. In other words, the function will get the data from the previous row/quarter. But if you want to go two rows/quarters back, then you would write LAG(quarterly_copies_sold, 2).
I also use ORDER BY in OVER() to make sure the values within the frame are sorted from the oldest to the latest quarter.
This use of the window function is so that it’s clearer what I’ll do in the next code line. This is where the actual calculation of comparing the current and previous quarter’s sales is done. It’s simple now: subtract the window function defined above from the column quarterly_copies_sold.
Here’s the output:
| sales_period | quarterly_copies_sold | previous_quarter_sales | quarterly_sales_difference |
|---|---|---|---|
| 2022_1Q | 9,519 | NULL | NULL |
| 2022_2Q | 7,581 | 9,519 | -1,938 |
| 2022_3Q | 4,273 | 7,581 | -3,308 |
| 2022_4Q | 7,403 | 4,273 | 3,130 |
| 2023_1Q | 7,403 | 7,403 | 0 |
| 2023_2Q | 4,956 | 7,403 | -2,447 |
| 2023_3Q | 7,795 | 4,956 | 2,839 |
There are no previous values for 2022_1Q, as there is no previous quarter. The quarterly sales for 2022_2Q are 7,581. The sales in the previous quarter were 9,519. The calculation shows that the current sales are 1,938 copies (7,581 - 9,519) below the previous quarter's sales.
You can analyze the rest of the output the same way.
Solve This Exercise for Practice
Rewrite the above query so it shows the difference between quarterly sales on a year-over-year basis – e.g., compare the first quarter of 2023 with the first quarter of 2022. Show the sales period, copies sold in the quarter, sales for the same quarter in the previous year, and the year-over-year difference between the quarters.
Solution:
SELECT *, LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_sales, quarterly_copies_sold - LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_difference FROM (SELECT sales_period, SUM(copies_sold) AS quarterly_copies_sold FROM album_catalogue GROUP BY sales_period) AS quarterly_sales;
Output:
| sales_period | quarterly_copies_sold | year_over_year_sales | year_over_year_difference |
|---|---|---|---|
| 2022_1Q | 9,519 | NULL | NULL |
| 2022_2Q | 7,581 | NULL | NULL |
| 2022_3Q | 4,273 | NULL | NULL |
| 2022_4Q | 7,403 | NULL | NULL |
| 2023_1Q | 7,403 | 9,519 | -2,116 |
| 2023_2Q | 4,956 | 7,581 | -2,625 |
| 2023_3Q | 7,795 | 4,273 | 3,522 |
If you want more, here are additional window function examples. For practice materials, take a look at these 11 SQL window functions exercises.
SQL Window Functions: A Window to Better Data Analysis
This turned out to be a pretty comprehensive article about SQL window functions. You learned the window functions and how each crucial part of their syntax works.
You also know there are several categories of window functions. The most commonly used are aggregate, ranking, and analytical window functions. The practical examples showed you how window functions can be used in common data analysis tasks.
Hopefully, you didn’t skip the exercises in the article. If you did, I once again recommend that you solve them. Only through practice can you really bring home what SQL window functions are about.
The richest resource for learning and practicing is our Window Functions course. It is an interactive course that has over 200 hands-on exercises and covers the full syntax of window functions. If you have job interviews lined up, make sure you go through these SQL window functions interview questions. Good luck, and keep on learning SQL!