Featured image of post Understanding SQL Query Plans for Performance Optimization

Understanding SQL Query Plans for Performance Optimization

Query Plans for fun and profit...

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
SELECT users.id,  
       users.NAME,  
       orders.total,  
       products.product_name,  
       suppliers.supplier_name,  
       customer_support.agent_name  
FROM   users  
       JOIN (SELECT orders.id AS order_id,  
                    orders.user_id,  
                    orders.total,  
                    order_products.product_id  
             FROM   orders  
                    JOIN (SELECT order_products.order_id,  
                                 order_products.product_id  
                          FROM   order_products) AS order_products  
                      ON orders.id = order_products.order_id) AS orders  
         ON users.id = orders.user_id  
       JOIN (SELECT products.id AS product_id,  
                    products.product_name,  
                    product_suppliers.supplier_id  
             FROM   products  
                    JOIN (SELECT product_suppliers.product_id,  
                                 product_suppliers.supplier_id  
                          FROM   product_suppliers) AS product_suppliers  
                      ON products.id = product_suppliers.product_id) AS products  
         ON orders.product_id = products.product_id  
       JOIN (SELECT suppliers.id AS supplier_id,  
                    suppliers.supplier_name  
             FROM   suppliers) AS suppliers  
         ON products.supplier_id = suppliers.supplier_id  
       LEFT JOIN (SELECT customer_support.id     AS agent_id,  
                         customer_support.agent_name,  
                         support_tickets.user_id AS ticket_user_id  
                  FROM   customer_support  
                         JOIN (SELECT support_tickets.id,  
                                      support_tickets.user_id  
                               FROM   support_tickets) AS support_tickets  
                           ON customer_support.id = support_tickets.agent_id) AS  
                 customer_support  
              ON users.id = customer_support.ticket_user_id  
  

WHERE  users.id IN (SELECT DISTINCT( user_profiles.user_id )  
                    FROM   user_profiles  
                           JOIN (SELECT user_contacts.user_id AS contact_user_id  
                                 FROM   user_contacts  
                                 WHERE  user_contacts.contact_type = 'email') AS  
                                user_contacts  
                             ON user_profiles.user_id =  
                                user_contacts.contact_user_id  
                    WHERE  user_profiles.profile_status = 'active')  
  

ORDER  BY users.NAME,  
          orders.total DESC;

The above example query retrieves a list of users along with details about their orders, the products they purchased, the suppliers of those products, and (if available) the customer support agents who assisted them. It applies filters to include only active users** who have email contact information.

:)

A Brief History of SQL: From Humble Beginnings to Data Dominance

Ah, SQL—the language that makes data dance! But where did it all begin? Let’s take a trip down memory lane (bring snacks).

In the early 1970s, IBM’s dynamic duo, Donald D. Chamberlin and Raymond F. Boyce, were inspired by Edgar F. Codd’s relational model.

They rolled up their sleeves and developed a language called SEQUEL (Structured English Query Language) to manage and retrieve data stored in IBM’s System R.

Fun fact: SEQUEL had to drop a few vowels to become SQL because of a pesky trademark issue.

Initially, SQL was designed to be the go-to language for managing and retrieving data in relational databases.

Think of it as the universal remote for your data—minus the frustration of pressing the wrong button.

The First 10 Versions of SQL: A Journey Through Time

Here’s a quick rundown of the early milestones in SQL’s evolution:

  1. SQL-86 (1986): The first official standard by ANSI. It was like the Model T of SQL—basic but revolutionary.
  2. SQL-89 (1989): A minor revision, fixing some quirks. Think of it as SQL’s awkward teenage phase.
  3. SQL-92 (1992): A major update introducing new features. SQL was growing up and getting fancy.
  4. SQL:1999 (1999): Added regular expressions, triggers, and more. SQL was now the cool kid on the block.
  5. SQL:2003 (2003): Introduced XML-related features and window functions. SQL was branching out.
  6. SQL:2006 (2006): Focused on XML integration. SQL was getting tech-savvy.
  7. SQL:2008 (2008): Added INSTEAD OF triggers and the TRUNCATE statement. SQL was tidying up.
  8. SQL:2011 (2011): Brought in temporal data support. SQL was getting timely.
  9. SQL:2016 (2016): Added JSON support. SQL was keeping up with the cool kids.
  10. SQL:2019 (2019): Enhanced with more features. SQL was unstoppable.

You can read all about it on the Wikipedia Page.. Wikipedia is NEVER wrong..
SQL Wikipedia page.

Understanding SQL Query Plans: The Treasure Maps to Your Data

So whats a query plan???

Imagine you’re on a treasure hunt.

A SQL query plan is like the map that shows you how the database engine plans to find the treasure (your data).

It breaks down the steps the engine will take to execute your query.

Understanding this map is crucial for identifying any detours or obstacles that might slow down your quest.

THAT is a query plan! (forehead slap here)….

Interpreting Query Plans to Uncover Performance Bottlenecks

By analyzing a query plan, you can spot common performance pitfalls:

  • Full Table Scans: If the plan shows a full table scan, it might be time to introduce some indexes.
  • Missing Indexes: No indexes? No wonder your query is slower than a snail on a treadmill.
  • Expensive Joins: Nested loops and Cartesian joins can be performance killers.
  • Sorting and Aggregations: Without proper indexing, these operations can feel like waiting for a pot to boil.
  • High Estimated Costs or Row Counts: These could indicate your query is biting off more than it can chew.

Example 1: The Case of the Inefficient Query

SQL Query:

1
2
3
4
SELECT * 
FROM employees 
WHERE department = 'HR' 
AND salary > 50000;

( we have all done this.. but none of us will admit it…)

Query Plan Before Optimization:

1
2
Seq Scan on employees  (cost=0.00..5000.00 rows=100 width=100)
  Filter: ((department = 'HR') AND (salary > 50000))
  • Problem: Full table scan. Ouch.
  • Fix: Create an index.

Solution:

1
CREATE INDEX idx_department_salary ON employees(department, salary);

Query Plan After Optimization:

1
2
Index Scan using idx_department_salary on employees  (cost=0.00..200.00 rows=100 width=100)
  Index Cond: ((department = 'HR') AND (salary > 50000))

Now the query is way faster!

Example 2: The Join That Needed a Gym Membership

SQL Query:

1
2
3
4
SELECT orders.id, customers.name, orders.total
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE customers.region = 'West';

Query Plan Before Optimization:

1
2
3
Nested Loop  (cost=1000.00..50000.00 rows=10000 width=100)
  -> Seq Scan on customers  (cost=0.00..25000.00 rows=5000 width=50)
  -> Seq Scan on orders  (cost=0.00..25000.00 rows=5000 width=50)
  • Problem: Full table scans in a join. Yikes.
  • Fix: Index the join columns.

Solution:

1
2
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_region ON customers(region);

Query Plan After Optimization:

1
2
3
4
Hash Join  (cost=500.00..10000.00 rows=10000 width=100)
  Hash Cond: (orders.customer_id = customers.id)
  -> Seq Scan on customers  (cost=0.00..2500.00 rows=500 width=50)
  -> Index Scan using idx_orders_customer_id on orders  (cost=0.00..2500.00 rows=500 width=50)

Much better!

Key Ideas

ConceptExplanation
SQL OriginsDeveloped in the 1970s by IBM.
Query PlansExplain how a query is executed.
Full Table ScansIndicate missing indexes.
IndexingCan dramatically improve performance.
JoinsNeed indexes on foreign keys for efficiency.