SQL Query Optimization

Essential patterns for diagnosing and fixing slow SQL queries across relational databases.

Core Principles

Measure before optimizing - Always read the EXPLAIN plan before rewriting a query
Indexes solve most problems - 90% of slow queries are missing an appropriate index
Keep predicates sargable - Never wrap indexed columns in functions
Reduce the working set early - Filter and limit rows before joins and aggregations
Statistics drive the optimizer - Stale statistics produce bad plans regardless of indexes

Reading EXPLAIN Plans

-- PostgreSQL: EXPLAIN ANALYZE shows actual execution times
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.order_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2025-01-01'
  AND o.status = 'shipped';

-- Key output to examine:
-- Seq Scan on orders  (cost=0.00..18450.00 rows=5200 width=48) (actual time=0.03..112.5 rows=5180 loops=1)
--   Filter: ((order_date >= '2025-01-01') AND (status = 'shipped'))
--   Rows Removed by Filter: 994820
--   Buffers: shared hit=8450
-- Planning Time: 0.15 ms
-- Execution Time: 115.8 ms

See explain-plan-reading.md for:

EXPLAIN vs EXPLAIN ANALYZE differences and when to use each
Understanding cost estimates, row counts, and width
Identifying sequential scans, index scans, and bitmap scans
Reading join algorithm nodes (Nested Loop, Hash Join, Merge Join)

Index Selection and Design

-- Composite index: column order matters for range queries
-- Put equality columns first, range columns last
CREATE INDEX idx_orders_status_date
ON orders (status, order_date);

-- Covering index: includes all columns the query needs
-- Avoids heap table lookups entirely
CREATE INDEX idx_orders_covering
ON orders (status, order_date)
INCLUDE (customer_id, total_amount);

-- Partial index: smaller index, faster lookups
CREATE INDEX idx_orders_pending
ON orders (order_date)
WHERE status = 'pending';

See index-optimization.md for:

When to create an index and when indexes hurt performance
Composite index column ordering rules (equality-range-sort)
Covering indexes with INCLUDE columns
Partial and filtered indexes for selective queries

Sargable Predicates

-- NON-SARGABLE: function wraps the indexed column, full table scan
SELECT * FROM orders
WHERE YEAR(order_date) = 2025 AND MONTH(order_date) = 6;

-- SARGABLE: direct comparison on indexed column, index seek
SELECT * FROM orders
WHERE order_date >= '2025-06-01'
  AND order_date < '2025-07-01';

See sargable-queries.md for:

Full list of sargable vs non-sargable predicate patterns
Function-on-column traps and their rewrites
Implicit type casting that prevents index usage
LIKE prefix matching vs wildcard-leading patterns

Query Rewriting Techniques

-- EXISTS is typically faster than IN for correlated checks on large tables
-- BAD: IN with subquery scans the full subquery result set
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE c.customer_id IN (
    SELECT o.customer_id FROM orders o WHERE o.total_amount > 500
);

-- BETTER: EXISTS short-circuits on first match per row
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.total_amount > 500
);

See query-rewriting.md for:

EXISTS vs IN performance characteristics and NULL behavior
CTE materialization fences and when CTEs hurt performance
LATERAL joins for correlated subquery replacement
Batch operations and row-at-a-time elimination

JOIN Optimization

-- Let the optimizer choose the join algorithm, but understand the trade-offs
-- Nested Loop: best for small outer + indexed inner (< 1000 outer rows)
-- Hash Join: best for large unsorted sets with equality predicates
-- Merge Join: best when both sides are pre-sorted or indexed

-- Rewrite to reduce the working set BEFORE joining
WITH recent_orders AS (
    SELECT order_id, customer_id, total_amount
    FROM orders
    WHERE order_date >= '2025-01-01'
)
SELECT c.customer_name, ro.total_amount
FROM recent_orders ro
JOIN customers c ON ro.customer_id = c.customer_id;

See join-optimization.md for:

Join algorithm selection criteria and performance profiles
Join order and how the optimizer chooses driving tables
Statistics impact on join strategy selection
Join hints and when to override the optimizer

Table Partitioning

-- PostgreSQL: declarative range partitioning on a 500M-row table
CREATE TABLE events (
    event_id    BIGSERIAL,
    event_date  DATE NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB
) PARTITION BY RANGE (event_date);

CREATE TABLE events_2025_q1 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE events_2025_q2 PARTITION OF events
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

-- Queries that filter on event_date automatically prune partitions
SELECT * FROM events
WHERE event_date BETWEEN '2025-02-01' AND '2025-02-28';
-- Only scans events_2025_q1, skips all other partitions

See partitioning-strategies.md for:

Range, list, and hash partitioning use cases
Partition pruning mechanics and verifying pruning in EXPLAIN
Partition maintenance (adding, dropping, detaching)
Sub-partitioning and composite partition keys

Statistics Maintenance

-- PostgreSQL: manually refresh statistics after bulk loads
ANALYZE orders;
ANALYZE VERBOSE orders;  -- shows column-level stats gathered

-- Check when stats were last updated
SELECT
    schemaname, relname,
    last_analyze, last_autoanalyze,
    n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';

See statistics-maintenance.md for:

ANALYZE (PostgreSQL) and UPDATE STATISTICS (SQL Server) usage
Auto-vacuum thresholds and tuning parameters
Histogram buckets and most-common-values lists
Detecting stale statistics and cardinality misestimates

Cross-Database Compatibility

Feature	PostgreSQL	MySQL	SQL Server
Execution plan	`EXPLAIN (ANALYZE, BUFFERS)`	`EXPLAIN ANALYZE` (8.0+)	`SET STATISTICS PROFILE ON`
Create index concurrently	`CREATE INDEX CONCURRENTLY`	`ALTER TABLE ... ALGORITHM=INPLACE`	`CREATE INDEX ... WITH (ONLINE=ON)`
Partial/filtered index	`WHERE clause` on index	Not supported	`WHERE clause` on index
Covering index	`INCLUDE (cols)`	Implicit (InnoDB secondary)	`INCLUDE (cols)`
Update statistics	`ANALYZE table`	`ANALYZE TABLE t`	`UPDATE STATISTICS t`
Partition pruning	Automatic (declarative)	Automatic	Automatic (partition elimination)
Index hints	`SET enable_seqscan=off`	`USE INDEX (idx)`	`WITH (INDEX(idx))`
Parallel query	Automatic (workers)	Limited (8.0+)	`MAXDOP` hint

Anti-Patterns to Avoid

Avoid	Use Instead	Why
`SELECT *` in production queries	Explicit column list	Prevents covering index usage, transfers excess data
`WHERE UPPER(email) = 'FOO'`	Expression index or `citext` type	Wrapping column in function prevents index seek
`ORDER BY RAND()` for random rows	`TABLESAMPLE` or offset-based random	Full table scan and sort on every execution
Correlated subquery in SELECT list	JOIN with aggregation	Executes subquery once per outer row (N+1 problem)
`NOT IN (subquery)` with NULLs	`NOT EXISTS` or `LEFT JOIN WHERE NULL`	NULL in subquery makes entire predicate UNKNOWN
Index on every column	Targeted composite indexes	Write overhead, storage waste, optimizer confusion
`OFFSET 10000 ROWS` for deep paging	Keyset pagination (`WHERE id > last_seen`)	Offset still scans and discards skipped rows
Hint-driven query plans	Fix statistics and indexes first	Hints become stale as data distribution changes

Performance Benchmarking

Always test with production-scale data; plans change dramatically at different row counts
Compare Seq Scan cost vs Index Scan cost in EXPLAIN output
Watch for "Rows Removed by Filter" -- a high number signals a missing or wrong index
Monitor shared hit vs shared read in BUFFERS output to gauge cache effectiveness
Use pg_stat_statements (PostgreSQL) or Query Store (SQL Server) to track slow queries over time
Re-ANALYZE after bulk loads, schema changes, or large DELETE operations

source: PostgreSQL docs, MySQL optimization guide, SQL Server Query Processing Architecture, Use The Index Luke (Markus Winand)

sql-query-optimizationSafety 95Repository

Package Files

SQL Query Optimization

Core Principles

Reading EXPLAIN Plans

Index Selection and Design

Sargable Predicates

Query Rewriting Techniques

JOIN Optimization

Table Partitioning

Statistics Maintenance

Cross-Database Compatibility

Anti-Patterns to Avoid

Performance Benchmarking

Install

AI Quality Score

Metadata

Tags

sql-query-optimizationSafety 95Repository ShareFavorite skill

Package Files

SQL Query Optimization

Core Principles

Reading EXPLAIN Plans

Index Selection and Design

Sargable Predicates

Query Rewriting Techniques

JOIN Optimization

Table Partitioning

Statistics Maintenance

Cross-Database Compatibility

Anti-Patterns to Avoid

Performance Benchmarking

Install

AI Quality Score

Metadata

Tags

sql-query-optimizationSafety 95Repository