1. Analytic Functions in SQL
Analytic functions (also called window functions) perform calculations across a specific range of rows (a window) related to the current row. Unlike aggregate functions (which collapse multiple rows into one), analytic functions retain individual row results while still providing aggregated insights.
πΉ Key Features of Analytic Functions
- They operate over a defined window of rows.
- They do not group the result set into a single row.
- They require an OVER() clause.
πΉ Common Analytic Functions
Function | Description |
---|
ROW_NUMBER() | Assigns a unique row number to each row in a partition. |
RANK() | Assigns a rank to each row, allowing for ties (skips numbers for ties). |
DENSE_RANK() | Similar to RANK() , but without skipping numbers for ties. |
NTILE(n) | Distributes rows into n roughly equal buckets. |
LAG(column, n) | Returns the value of the column from n rows before the current row. |
LEAD(column, n) | Returns the value of the column from n rows after the current row. |
SUM() OVER() | Calculates a running total. |
AVG() OVER() | Computes a moving average. |
πΉ Example: Using ROW_NUMBER()
Letβs say we have a sales
table:
id | customer | amount |
---|
1 | Alice | 100 |
2 | Bob | 200 |
3 | Alice | 150 |
4 | Bob | 250 |
5 | Charlie | 300 |
To rank each customerβs purchases:
1
2
3
4
5
| SELECT
customer,
amount,
ROW_NUMBER() OVER(PARTITION BY customer ORDER BY amount DESC) AS row_num
FROM sales;
|
Breakdown:
PARTITION BY customer
: Resets the row numbering for each customer.ORDER BY amount DESC
: Orders within each partition by highest amount.ROW_NUMBER()
: Assigns a unique row number per partition.
Result:
customer | amount | row_num |
---|
Alice | 150 | 1 |
Alice | 100 | 2 |
Bob | 250 | 1 |
Bob | 200 | 2 |
Charlie | 300 | 1 |
2. Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a temporary result set that exists only within the execution of a single query. It helps improve readability and modularity.
πΉ Why Use CTEs?
- Makes complex queries easier to read.
- Can be referenced multiple times within the same query.
- Helps avoid subquery repetition.
πΉ Basic Syntax
1
2
3
4
| WITH cte_name AS (
-- Your query here
)
SELECT * FROM cte_name;
|
πΉ Example: Using a CTE
Consider a sales
table where we want to calculate the total sales per customer, and then filter those with sales above $200.
1
2
3
4
5
6
7
8
| WITH total_sales AS (
SELECT customer, SUM(amount) AS total_amount
FROM sales
GROUP BY customer
)
SELECT *
FROM total_sales
WHERE total_amount > 200;
|
Breakdown:
- The CTE (
total_sales
) first computes SUM(amount)
per customer. - The main query filters results where
total_amount > 200
.
π‘ Alternative (Without CTEs): Weβd have to repeat the aggregation:
1
2
3
4
5
6
| SELECT * FROM (
SELECT customer, SUM(amount) AS total_amount
FROM sales
GROUP BY customer
) t
WHERE total_amount > 200;
|
CTEs make queries much cleaner! π
Combining Analytic Functions and CTEs
Sometimes, CTEs and analytic functions work well together. Let’s rank sales using a CTE:
1
2
3
4
5
6
7
8
| WITH ranked_sales AS (
SELECT
customer,
amount,
ROW_NUMBER() OVER(PARTITION BY customer ORDER BY amount DESC) AS rank
FROM sales
)
SELECT * FROM ranked_sales WHERE rank = 1;
|
πΉ Purpose: This finds the highest sale per customer.
3. Temporary Table
A Temporary Table is a physical table stored in tempdb
(SQL Server) or a temporary schema (MySQL, PostgreSQL). It can store intermediate results and be referenced multiple times within a session.
Characteristics of Temporary Tables
- Scope:
#TempTable
(local) exists only in the current session.##GlobalTempTable
(global) can be accessed by multiple sessions.
- Performance: Stored in memory/disk, better for large data sets.
- Reusability: Can be modified after creation (INSERT, UPDATE, DELETE).
- Indexing: Can have indexes for performance optimization.
- Syntax:
1
2
3
4
5
6
7
8
| CREATE TABLE #temp_table (
id INT,
name VARCHAR(50)
);
INSERT INTO #temp_table VALUES (1, 'Alice');
SELECT * FROM #temp_table;
|
Example: Using a Temporary Table to Store Sales Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| -- Create a temporary table
CREATE TABLE #TempSales (
customer VARCHAR(50),
total_amount DECIMAL(10,2)
);
-- Insert aggregated data into temp table
INSERT INTO #TempSales
SELECT customer, SUM(amount)
FROM sales
GROUP BY customer;
-- Query from temp table multiple times
SELECT * FROM #TempSales WHERE total_amount > 500;
SELECT COUNT(*) FROM #TempSales;
-- Drop the temp table when done
DROP TABLE #TempSales;
|
πΉ What Happens?
#TempSales
is physically created in memory (or disk if large).- The data persists until the session ends or the table is dropped.
- You can run multiple queries against it, unlike a CTE.
** Key Differences Between CTEs and Temp Tables**
Feature | CTE | Temporary Table |
---|
Scope | Exists only for the duration of the query | Exists for the session or until explicitly dropped |
Persistence | Not stored physically | Stored in tempdb (SQL Server) or temporary schema |
Performance | Best for small, in-memory calculations | Better for large datasets that need indexing |
Reusability | Cannot be reused outside the query | Can be queried multiple times |
Modifiability | Read-only | Can be modified (INSERT, UPDATE, DELETE) |
Indexing | No indexes | Can have indexes |
** When to Use What?**
Scenario | Best Choice |
---|
Breaking down complex queries for better readability | β
CTE |
One-time use within a single query | β
CTE |
Need to reuse data in multiple queries | β
Temp Table |
Working with large datasets that need indexing | β
Temp Table |
Needing to modify data after creation | β
Temp Table |
** 4. Analytic Function Syntax in Different Databases**
** Oracle**
Oracle has the most mature support for analytic functions, available since Oracle 8i.
Basic Example: Ranking Orders by Amount
1
2
3
4
| SELECT
customer_id, order_id, amount,
RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS order_rank
FROM orders;
|
Running Total Example
1
2
3
4
| SELECT
customer_id, order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_id) AS running_total
FROM orders;
|
πΉ Oracle-Specific Features
- Supports advanced functions like
CUME_DIST()
and PERCENT_RANK()
. - Performance-optimized window functions with Oracle’s query optimizer.
- Can use RANGE and ROWS in the
OVER()
clause.
** SQL Server (MSSQL)**
SQL Server introduced analytic functions in SQL Server 2012, making it one of the later adopters.
Basic Example: Ranking Employees by Salary
1
2
3
4
| SELECT
employee_id, department_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank_in_department
FROM employees;
|
Lag and Lead Functions Example
1
2
3
4
5
| SELECT
employee_id, salary,
LAG(salary, 1, 0) OVER (ORDER BY employee_id) AS previous_salary,
LEAD(salary, 1, 0) OVER (ORDER BY employee_id) AS next_salary
FROM employees;
|
πΉ SQL Server-Specific Features
- Full support for analytic functions (since 2012).
- Performance tuning required for large datasets.
- Limited statistical functions (compared to Oracle/PostgreSQL).
** PostgreSQL**
PostgreSQL introduced full analytic function support in version 8.4 and provides more flexibility than SQL Server.
Basic Example: Ranking Products by Sales
1
2
3
4
| SELECT
product_id, category_id, sales,
RANK() OVER (PARTITION BY category_id ORDER BY sales DESC) AS rank
FROM sales;
|
Running Sum Example
1
2
3
4
| SELECT
customer_id, amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS running_sum
FROM transactions;
|
πΉ PostgreSQL-Specific Features
- Supports advanced statistical window functions (
CUME_DIST()
, PERCENT_RANK()
). - Supports custom aggregate window functions (not available in Oracle or SQL Server).
- Better performance for large datasets (compared to SQL Server).
** MySQL (8.0+)**
MySQL introduced window functions in MySQL 8.0 (before that, they were unsupported).
Basic Example: Ranking Customers
1
2
3
4
| SELECT
customer_id, order_id, amount,
RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rank
FROM orders;
|
Running Total Example
1
2
3
4
| SELECT
customer_id, order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_id) AS running_total
FROM orders;
|
πΉ MySQL-Specific Limitations
- Lacks statistical functions (
CUME_DIST()
, PERCENT_RANK()
). - Performance is not optimized for large datasets.
- No custom aggregate window functions (unlike PostgreSQL).
π Key Differences Across Databases
Feature | Oracle | SQL Server (MSSQL) | PostgreSQL | MySQL (8.0+) |
---|
First to support window functions? | β
Yes (Oracle 8i) | β No (SQL Server 2012) | β
Yes (PostgreSQL 8.4) | β No (MySQL 8.0) |
Ranking functions (RANK() , DENSE_RANK() , ROW_NUMBER() ) | β
Yes | β
Yes | β
Yes | β
Yes |
Lag/Lead functions (LAG() , LEAD() ) | β
Yes | β
Yes | β
Yes | β
Yes |
Running totals (SUM() OVER() , AVG() OVER() ) | β
Yes | β
Yes | β
Yes | β
Yes |
Statistical functions (CUME_DIST() , PERCENT_RANK() ) | β
Yes | β
Yes | β
Yes | β No |
Custom aggregate window functions | β No | β No | β
Yes | β No |