PostgreSQL - CUME_DIST Function

PostgreSQL - CUME_DIST Function

Last Updated : 18 Jul, 2024

The PostgreSQL CUME_DIST() function is a powerful analytical tool used to determine the relative position of a value within a set of given values. This function helps compute the cumulative distribution of values in a result set, which can be particularly useful in statistical analysis and reporting.

Let us better understand the CUME_DIST Function in PostgreSQL from this article.

Syntax

CUME_DIST() OVER (
    [PARTITION BY partition_expression, ... ]
    ORDER BY sorting_expression [ASC | DESC], ...
)

Parameters:

PARTITION BY: This optional clause divides the result set into partitions where the function is applied. If not specified, PostgreSQL treats the entire result set as a single partition.
ORDER BY: This clause sorts the rows in each partition where the function is applied.

Return Value:

The CUME_DIST() function returns a double-precision value between 0 and 1:

0 < CUME_DIST() <= 1

PostgreSQL CUME_DIST Function Examples

Let us take a look at some of the examples of CUME_DIST Function in PostgreSQL to better understand the concept.

Example 1: Sales Percentile for 2018

First, create a table named 'sales_stats' that stores the sales revenue by employees:

PostgreSQL

CREATE TABLE sales_stats(
    name VARCHAR(100) NOT NULL,
    year SMALLINT NOT NULL CHECK (year > 0),
    amount DECIMAL(10, 2) CHECK (amount >= 0),
    PRIMARY KEY (name, year)
);
INSERT INTO 
    sales_stats(name, year, amount)
VALUES
    ('Raju kumar', 2018, 120000),
    ('Alibaba', 2018, 110000),
    ('Gabbar Singh', 2018, 150000),
    ('Kadar Khan', 2018, 30000),
    ('Amrish Puri', 2018, 200000),
    ('Raju kumar', 2019, 150000),
    ('Alibaba', 2019, 130000),
    ('Gabbar Singh', 2019, 180000),
    ('Kadar Khan', 2019, 25000),
    ('Amrish Puri', 2019, 270000);

The following query returns the sales amount percentile for each sales employee in 2018.

Query:

SELECT 
    name,
    year, 
    amount,
    CUME_DIST() OVER (
        ORDER BY amount
    ) 
FROM 
    sales_stats
WHERE 
    year = 2018;

Output:

PostgreSQL CUME_DIST Function Example

Example 2: Sales Percentile for 2018 and 2019

The following query uses the CUME_DIST() function to calculate the sales percentile for each sales employee in 2018 and 2019.

Query:

SELECT 
    name,
    year,
    amount,
    CUME_DIST() OVER (
        PARTITION BY year
        ORDER BY amount
    )
FROM 
    sales_stats;

Output:

PostgreSQL CUME_DIST Function Example

Important Points About PostgreSQL CUME_DIST Function

The CUME_DIST() function calculates the cumulative distribution of a value in a dataset.
When there are ties (duplicate values) in the ordering column, CUME_DIST() assigns the same cumulative distribution value to each tied row.
The result of CUME_DIST() is in double-precision. Be mindful of this when performing further calculations or comparisons with the result.
CUME_DIST() and PERCENT_RANK() both return values between 0 and 1, but they differ in calculation. CUME_DIST() shows the proportion of rows with values less than or equal to the current row, while PERCENT_RANK() indicates the relative rank of the current row within the partition.

PostgreSQL - CUME_DIST Function

R

RajuKumar19

Improve

Article Tags :

Similar Reads

PostgreSQL - COUNT() Function

The COUNT() function in PostgreSQL is an aggregate function used to return the number of rows that match a specified condition in a query. This article will explore the various syntaxes of the COUNT() function and provide practical examples to help you understand its usage in PostgreSQL.SyntaxDepend

PostgreSQL - AVG() Function

In PostgreSQL, the AVG() function is a powerful tool used to calculate the average value of a set of numeric values. It is one of the most frequently used aggregate functions in PostgreSQL, making it a crucial part of any database user's toolkit. This function allows users to efficiently compute the

PostgreSQL - DENSE_RANK Function

In PostgreSQL, the DENSE_RANK() function is a powerful tool used to assign ranks to rows within a partition of a result set, ensuring there are no gaps in the ranking values. Unlike the RANK() function, which may skip rank numbers when there are ties, DENSE_RANK() always returns consecutive rank val

PostgreSQL - SUM() Function

The SUM() function in PostgreSQL is used to calculate the sum of values in a numeric column. This article will guide you through the syntax, important considerations, and practical examples of using the SUM() function in PostgreSQL.SyntaxSUM(column) The following points need to be kept in mind while

PostgreSQL - LAG Function

In PostgreSQL, the LAG() function is a powerful window function that allows you to access data from a previous row within the same result set. Itâ€™s particularly useful for comparing values in the current row with values in the preceding row, making it ideal for analytical queries in PostgreSQL.For e