SQL | GROUP BY
The SQL GROUP BY
clause is a powerful tool used to organize data into groups based on shared values in one or more columns. It is most often used with aggregate functions like SUM
, COUNT
, AVG
, MIN
and MAX
to perform summary operations on each group, helping us extract meaningful analysis from large datasets.
How to use GROUP BY in SQL
The GROUP BY statement in SQL is used to arrange identical data into groups based on specified columns. If a particular column has the same values in multiple rows, the GROUP BY clause will group these rows together. Whether we are analyzing sales by region, users by age group or orders by product category, the GROUP BY
clause helps transform raw data into structured reports.
Key Points About GROUP BY
- GROUP BY clause is used with the SELECT statement.
- In a SQL query, the GROUP BY clause is placed after the WHERE clause, but before HAVING Clause and ORDER BY clause
WHERE
clause filters rows before grouping, theHAVING
clause is used to apply conditions on the grouped data- Order of Execution in a SQL query is SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY.
Syntax:
SELECT column1, function_name(column2)
FROM table_name
GROUP BY column1, column2
Parameters:
- function_name: Name of the function used for example, SUM() , AVG().
- table_name: Name of the table.
- condition: Condition used.
Examples of GROUP BY in SQL
Let's assume that we have two tables Employee and Student as follows. After adding two tables we will do some specific operations to learn about GROUP BY. Insert some random data into a table and then we will perform some operations in GROUP BY.
Employee Table
CREATE TABLE emp (
emp_no INT PRIMARY KEY,
name VARCHAR(50),
sal DECIMAL(10,2),
age INT
);
INSERT INTO emp (emp_no, name, sal, age) VALUES
(1, 'Aarav', 50000.00, 25),
(2, 'Aditi', 60000.50, 30),
(3, 'Aarav', 75000.75, 35),
(4, 'Anjali', 45000.25, 28),
(5, 'Chetan', 80000.00, 32),
(6, 'Divya', 65000.00, 27),
(7, 'Gaurav', 55000.50, 29),
(8, 'Divya', 72000.75, 31),
(9, 'Gaurav', 48000.25, 26),
(10, 'Divya', 83000.00, 33);
SELECT * from emp;
Output:

Student Table
CREATE TABLE student (
name VARCHAR(50),
year INT,
subject VARCHAR(50)
);
INSERT INTO student (name, year, subject) VALUES
('Alice', 1, 'Mathematics'),
('Bob', 2, 'English'),
('Charlie', 3, 'Science'),
('David', 1, 'Mathematics'),
('Emily', 2, 'English'),
('Frank', 3, 'Science');
Output:

Example 1 : Group By Single Column
Group By single column means, placing all the rows with the same value of only that particular column in one group. Consider the query for Calculating the Total Salary of each Employee by their name in the emp table.
Query:
SELECT name, SUM(sal) FROM emp
GROUP BY name;
Output:

Explanations:
As you can see in the above output, the rows with duplicate NAMEs are grouped under the same NAME and their corresponding SALARY is the sum of the SALARY of duplicate rows. The SUM() function of SQL is used here to calculate the sum. The NAMES that are added are Aarav, Divya and Gaurav.
Example 2 : Group By Multiple Columns
Group by multiple columns is say, for example, GROUP BY column1, column2. This means placing all the rows with the same values of columns column 1 and column 2 in one group. This SQL query groups student records by both SUBJECT
and YEAR
and then counts the number of records (i.e., students) in each of those groups.
Query:
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
Output:

Explantion:
- As we can see in the above output students with same SUBJECT and YEAR are placed in the same group.
- Those whose only SUBJECT is same but YEAR belong to different groups. So here we have grouped the table according to two columns or more than one column.
- The Grouped subject and years are (English,2) , (Mathematics,1) and (Science,3). The above mentioned all groups and years are repeated twice.
HAVING Clause in GROUP BY Clause
We know that WHERE clause is used to place conditions on columns but what if we want to place conditions on groups? This is where the HAVING clause comes into use. We can use the HAVING clause to filter results based on the conditions applied to grouped data, such as sums, averages or counts.
One important point to note is that we cannot use aggregate functions like SUM()
, COUNT()
, AVG()
, etc., directly in the WHERE
clause. Instead, we use the HAVING
clause when we need to filter data based on the result of these aggregate functions.
Syntax:
SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
HAVING condition
ORDER BY column1, column2;
Key Terms
- function_name: Name of the function used for example, SUM() , AVG().
- table_name: Name of the table.
- condition: Condition used.
Example:
SELECT NAME, SUM(sal) FROM Emp
GROUP BY name
HAVING SUM(sal)>50000;
Output:

Explanation:
In the result, only employees whose total salary (SUM(sal)
) exceeds 50,000 are displayed. For example, if Anjali has a total salary less than 50,000, she will be excluded from the output.
Conclusion
The GROUP BY function in SQL organizes identical data into groups, enabling aggregate analysis on each group. It is commonly used with aggregate functions like SUM(), COUNT(), AVG(), etc., to summarize data efficiently. The HAVING clause further refines the results by applying conditions to these grouped records. GROUP BY can operate on single or multiple columns, making it a versatile tool for data retrieval and reporting.