How to Select the First Row of Each GROUP BY in SQL?

Last Updated : 05 Dec, 2024

Selecting the first row of each group in SQL is a common requirement in database queries, especially when working with grouped data. SQL GROUP BY queries are highly effective for organizing data into logical groups, and additional techniques like window functions help to isolate the first row from each group

In this article, we will explore multiple methods to efficiently select the first row of each group in SQL, providing comprehensive insights through syntax, examples, and outputs. By the end of this article, we will have a clear understanding of how to use GROUP BY with aggregate functions.

Selecting the First Row of Each Group in SQL

The GROUP BY statement in SQL is used to group rows based on the values in one or more columns. To fetch only the first row from each group, additional techniques like aggregate functions (e.g., MIN(), MAX()) or window functions (e.g., ROW_NUMBER()) are often used. Let’s explore these methods in detail.

Syntax

SELECT * FROM yourTable WHERE columnName = (SELECT MIN(columnName) FROM yourTable GROUP BY groupColumnName)

This query selects all rows from ‘ yourTable ‘ where the value in ‘ columnName ‘ is equal to the minimum value of ‘ columnName ‘ in each group defined by ‘ groupColumnName ‘.

Creating and Inserting Data into the `users` Table

To demonstrate the SQL queries effectively, let's start by creating the users table and inserting some sample data.

Step 1: Create the Table

CREATE TABLE users (
    user_id INT,
    username VARCHAR(50),
    start_date DATE
);

Step 2: Insert Sample Data

INSERT INTO users (user_id, username, start_date)
VALUES
    (1, 'Jeff', '2021-08-02'),
    (2, 'Mack', '2021-01-02'),
    (2, 'Jee', '2021-05-23'),
    (3, 'Harry', '2020-04-03'),
    (4, 'Merry', '2022-04-03'),
    (5, 'Twinkle', '2022-03-09');

Step 3: View the user table

Select * FROM users;

Output

Method 1: Using GROUP BY with Aggregate Functions

To select the first row of each group in SQL, you can use the ' GROUP BY ' clause in combination with an aggregate function such as ' MIN() ' or ' MAX() '. For example, the following query selects the earliest start date for each group of records with the same ' user_id ':

Finding the Earliest Start Date for Each User

For example, if we want to find the earliest start date for each user, we can use the MIN() function to get the first row based on the start date.

Query:

SELECT MIN(start_date) 
AS earliest_start_date
FROM users
GROUP BY user_id

Output

earliest_start_date
2021-08-02
2021-01-02
2020-04-03
2022-04-03
2022-03-09

Explanation:

This query uses the ' MIN() ' function to find the earliest start date for each group of records with the same ' user_id '. The ' GROUP BY ' clause groups the records by ' user_id ', and the ' SELECT ' statement returns the minimum start date for each group.

Method 2: Using ROW_NUMBER() Window Function

The ROW_NUMBER() function is part of the window functions in SQL, which are more flexible than simple aggregation. This method allows us to assign a unique number to each row within a group and then filter for the first row.

Selecting the First Row per User Based on Start Date

In this approach, we will use ROW_NUMBER() to rank each row within its group by the start_date. The first row for each group will be assigned a ROW_NUMBER of 1.

Query:

SELECT *
FROM (
 SELECT
     *,
     ROW_NUMBER() OVER (PARTITION BY user_id 
     ORDER BY start_date ASC) AS row_number
 FROM users
) t
WHERE t.row_number = 1

Output

Row_Number-Window-Function- — Using ROW_NUMBER() Window Functions

Explanation:

The ROW_NUMBER() function assigns a unique row number to each row within a group (specified by the PARTITION BY clause).
The rows are ordered by start_date in ascending order, meaning the first record for each user will get a row number of 1.
The outer query filters the results to include only rows where row_number = 1, ensuring that only the first row of each group is returned.

Method 3: Fetching Specific Columns

We can modify this query to return only specific columns by replacing the ' * ' in the ' SELECT ' clause with a comma-separated list of the columns we want to include in the result set. For example, the following query returns the ' user_id ' and ' start_date ' columns for the first row of each group:

Query:

SELECT user_id, start_date
FROM (
 SELECT
     user_id,
     start_date,
 ROW_NUMBER() OVER (PARTITION BY user_id 
 ORDER BY start_date ASC) AS row_number
 FROM users
) t
WHERE t.row_number = 1

Output

user_id	start_date
1	2021-08-02
2	2021-01-02
3	2020-04-03
4	2022-04-03
5	2022-03-09

Explanation:

Here, only the user_id and start_date columns are selected from the result set, reducing the data load and improving query performance.

Comparison of Methods

Method	Pros	Cons
GROUP BY with MIN()/MAX()	Simple and easy to use.	Limited to numeric and date-based comparisons.
ROW_NUMBER() Function	Highly flexible and works for complex queries.	Slightly more complex syntax.
Specific Columns	Optimized for performance by fetching fewer columns.	Requires additional query modification.

Conclusion

In this article, we explained multiple ways to select the first row of each group in SQL. Each method whether using GROUP BY with aggregate functions or the ROW_NUMBER() window function has its unique advantages depending on the use case and database environment. By Using these techniques, we can efficiently retrieve grouped data while optimizing query performance.