PostgreSQL - CUME_DIST Function
The PostgreSQL CUME_DIST()
function is a powerful analytical tool used to determine the relative position of a value within a set of given values. This function helps compute the cumulative distribution of values in a result set, which can be particularly useful in statistical analysis and reporting.
Let us better understand the CUME_DIST Function in PostgreSQL from this article.
Syntax
CUME_DIST() OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sorting_expression [ASC | DESC], ...
)
Parameters:
- PARTITION BY: This optional clause divides the result set into partitions where the function is applied. If not specified, PostgreSQL treats the entire result set as a single partition.
- ORDER BY: This clause sorts the rows in each partition where the function is applied.
Return Value:
The CUME_DIST() function returns a double-precision value between 0 and 1:
0 < CUME_DIST() <= 1
PostgreSQL CUME_DIST Function Examples
Let us take a look at some of the examples of CUME_DIST Function in PostgreSQL to better understand the concept.
Example 1: Sales Percentile for 2018
First, create a table named 'sales_stats' that stores the sales revenue by employees:
CREATE TABLE sales_stats(
name VARCHAR(100) NOT NULL,
year SMALLINT NOT NULL CHECK (year > 0),
amount DECIMAL(10, 2) CHECK (amount >= 0),
PRIMARY KEY (name, year)
);
INSERT INTO
sales_stats(name, year, amount)
VALUES
('Raju kumar', 2018, 120000),
('Alibaba', 2018, 110000),
('Gabbar Singh', 2018, 150000),
('Kadar Khan', 2018, 30000),
('Amrish Puri', 2018, 200000),
('Raju kumar', 2019, 150000),
('Alibaba', 2019, 130000),
('Gabbar Singh', 2019, 180000),
('Kadar Khan', 2019, 25000),
('Amrish Puri', 2019, 270000);
The following query returns the sales amount percentile for each sales employee in 2018.
Query:
SELECT
name,
year,
amount,
CUME_DIST() OVER (
ORDER BY amount
)
FROM
sales_stats
WHERE
year = 2018;
Output:
Example 2: Sales Percentile for 2018 and 2019
The following query uses the CUME_DIST() function to calculate the sales percentile for each sales employee in 2018 and 2019.
Query:
SELECT
name,
year,
amount,
CUME_DIST() OVER (
PARTITION BY year
ORDER BY amount
)
FROM
sales_stats;
Output:
Important Points About PostgreSQL CUME_DIST Function
- The
CUME_DIST()
function calculates the cumulative distribution of a value in a dataset.- When there are ties (duplicate values) in the ordering column,
CUME_DIST()
assigns the same cumulative distribution value to each tied row.- The result of
CUME_DIST()
is in double-precision. Be mindful of this when performing further calculations or comparisons with the result.- CUME_DIST() and PERCENT_RANK() both return values between 0 and 1, but they differ in calculation. CUME_DIST() shows the proportion of rows with values less than or equal to the current row, while PERCENT_RANK() indicates the relative rank of the current row within the partition.