Introduction
SQL subselects (or subqueries) and GROUP BY—two of the most misunderstood yet incredibly powerful tools in SQL.
One lets you nest queries inside queries like SQL-ception, and the other allows you to aggregate data like a boss.
A Brief History of Subselects and GROUP BY
The Dark Ages of SQL (Before Subqueries)
Once upon a time, in the early days of relational databases, queries were simple—you could only SELECT, INSERT, UPDATE, and DELETE. But what if you wanted to:
- Find customers who placed an order last month?
- Retrieve the highest salary per department?
- Get the total revenue per year?
Without subqueries and GROUP BY, SQL was like a caveman with a rock, limited to basic operations.
The Enlightenment: The Birth of Subqueries
At some point, database wizards thought, “What if we could put a query INSIDE another query?” Boom. Subqueries were born. Now, SQL could filter data dynamically, using results from one query inside another.
The Rise of GROUP BY
But what if you wanted totals, averages, or counts? The solution: GROUP BY, which allowed SQL to aggregate data neatly into summary rows.
And thus, the world of SQL became far more powerful—and slightly more confusing.
How Subselects Work
A subselect (or subquery) is a query inside another query. It can be used in:
- SELECT statements (to fetch calculated values).
- WHERE clauses (to filter dynamically).
- FROM clauses (treating a subquery as a table).
Example 1: Subselect in WHERE Clause
Find employees who earn more than the company’s average salary.
MSSQL / MySQL / PostgreSQL
|
|
How it works:
- The inner query
(SELECT AVG(salary) FROM employees)
calculates the average salary. - The outer query retrieves employees earning above that average.
Example 2: Subselect in FROM Clause
Find the highest salary per department.
MSSQL / MySQL / PostgreSQL
|
|
But what if you need more details? That’s where subqueries in the FROM clause help.
|
|
Here, the subquery calculates the max salary per department, and the main query finds employees matching those salaries.
How GROUP BY Works
GROUP BY is used to aggregate data into groups, typically with functions like:
COUNT()
– Counts the number of rows.SUM()
– Adds up values.AVG()
– Calculates averages.MAX()
/MIN()
– Find the highest or lowest values.
Example 1: Counting Orders per Customer
|
|
This gives you one row per customer, showing how many orders they placed.
Example 2: Sales Revenue Per Year
|
|
Now we have total revenue per year.
Common Mistakes and How to Avoid Them
1. Using GROUP BY Without Aggregation
❌ Wrong:
|
|
SQL will complain because name and salary aren’t aggregated.
✅ Right:
|
|
2. Using a Subquery When a JOIN is Faster
❌ Inefficient Subquery:
|
|
✅ Better Approach Using JOIN:
|
|
Joins are usually faster than subqueries, so use them when possible!
Key Ideas
- Subqueries allow nested queries, useful for filtering, calculations, and table-like operations.
- GROUP BY aggregates data, making it easier to summarize information.
- Use JOINs instead of subqueries when performance matters.
- SQL became much more powerful when subqueries and GROUP BY were introduced.