Pytest Profiling & Optimization
Systematic workflow for diagnosing, fixing, and reporting on slow Python/pytest test suites. Measure first, optimize based on data, never guess.
Before Starting
-
Read project conventions — Check
AGENTS.md,CLAUDE.md, orpyproject.tomlto understand:- Which test directories exist (unit, integration, e2e)
- How to run tests (
uv run pytest,pytest,make test, etc.) - Test database setup and fixture patterns
-
Ask the user:
- Which test directory or marker to profile (or profile the full suite)
- Where to write the report file
-
Check for pyinstrument — If not installed, ask the user before adding it:
uv pip list | grep pyinstrument # If missing, ask before running: uv add --dev pyinstrument
Phase 1: Measure
All steps are read-only. No code changes.
Step 1 — Baseline timing + slowest tests
uv run pytest <test_dir> --durations=20 -q
Record:
- Total wall time
- Number of tests (passed, failed, skipped)
- Top 20 slowest phases (setup, call, teardown)
Note whether the slowest items are setup, call, or teardown — this determines where to dig.
Step 2 — Fixture setup/teardown chains
Pick 2-3 of the slowest tests and run:
uv run pytest <test_dir> -k "name_of_slow_test" -q --setup-show
This shows the full fixture dependency chain. Look for:
- Function-scoped fixtures that could be session-scoped
- Expensive fixtures called repeatedly (DB setup, app creation, crypto)
- Long teardown sequences
Step 3 — CPU profiling with pyinstrument
uv run pyinstrument -r text -m pytest <test_dir> -k "name_of_slow_test" -q
The call tree shows exactly where CPU time is spent. Repeat for 2-3 of the slowest tests to find common patterns.
Common hotspots to look for:
bcrypt.hashpw/bcrypt.gensalt— password hashing (~200ms per call at default 12 rounds)MetaData.create_all— SQLAlchemy table creationTRUNCATE ... CASCADE— PostgreSQL lock acquisitionTestClient(app)— ASGI lifespan enter/exit- Module imports — heavy libraries loaded at import time (matplotlib, boto3, numpy)
- RSA key generation —
rsa.generate_private_key() - Network connections — Redis/Valkey, external services
Step 4 — Micro-benchmarks (optional)
If a specific operation is suspect, isolate and measure it:
uv run python -c "
import time
# ... setup code ...
times = []
for i in range(10):
start = time.time()
# ... the suspect operation ...
times.append(time.time() - start)
print(f'avg: {sum(times)/len(times)*1000:.1f}ms')
"
This is useful for comparing alternatives (e.g., TRUNCATE vs DELETE, bcrypt rounds 12 vs 4).
Phase 2: Analyze
Summarize findings before touching any code:
- Where is time spent? — Categorize by fixture setup, test body, teardown
- Which fixtures are expensive? — List per-call cost and how many tests use them
- Is there a common bottleneck? — One or two things that dominate across all tests
- Estimate total impact —
(per-call cost) × (number of calls) = total savings
Phase 3: Optimize
Design targeted fixes based on the profiling data. Apply one fix at a time and measure after each.
Common fixes
Slow bcrypt hashing
Patch bcrypt.gensalt to use minimum rounds (4) in tests via a session-scoped autouse fixture in the root tests/conftest.py:
@pytest.fixture(autouse=True, scope="session")
def _fast_bcrypt():
original_gensalt = bcrypt.gensalt
def fast_gensalt(rounds=4, prefix=b"2b"):
return original_gensalt(rounds=4, prefix=prefix)
with patch("bcrypt.gensalt", fast_gensalt):
yield
Slow TRUNCATE teardown
Replace TRUNCATE ... CASCADE with DELETE FROM in reverse FK order. TRUNCATE acquires ACCESS EXCLUSIVE locks (~300ms even on empty tables). DELETE with row-level locks takes ~2ms for small row counts.
Expensive function-scoped fixtures
If a fixture is pure setup with no per-test state, consider widening its scope:
scope="module"— shared within one test filescope="session"— shared across the entire run
Only do this if tests don't mutate the fixture's state.
Slow RSA key generation
Cache a test keypair as a session-scoped fixture instead of generating per-test.
Slow module imports
These are one-time costs and generally not worth optimizing. Note them in the report but don't act on them unless they dominate.
Measurement after each fix
After each optimization:
- Run the full suite:
uv run pytest <test_dir> -q --tb=no - Record the new wall time
- Calculate delta from previous state
Phase 4: Commit
Commit each optimization as a separate commit with measured timings in the message:
perf(tests): <what changed>
<Why it helps.>
<previous>s → <new>s (saved ~<delta>s)
Use conventional commits (perf: prefix for performance improvements).
Phase 5: Report
Write a report to the location chosen by the user. The report must include all of the following sections:
Report template
# Test Suite Performance: Analysis & Fixes
**Date:** YYYY-MM-DD
**Result:** <before>s → <after>s (<speedup>x faster)
## Baseline
- Test command: `<exact command>`
- Total tests: X passed, Y failed (pre-existing), Z skipped
- Wall time: Xs
## Methodology
1. `pytest --durations=20` — identify slowest test phases
2. `pytest --setup-show` — trace fixture setup chains
3. `pyinstrument` — CPU call trees on slowest tests
## Profiling Findings
| Component | Time | Scope | Notes |
|---|---|---|---|
| ... | ... | ... | ... |
## Bottleneck Breakdown (estimated)
- **<bottleneck 1>:** N calls × Xms = ~Ys (Z%)
- **<bottleneck 2>:** N calls × Xms = ~Ys (Z%)
- **Actual test work:** ~Ys (Z%)
## Fixes
### Fix N: <title>
**Commit:** `<commit message>`
**File:** `<path>`
**Time saved:** <before>s → <after>s (**-Xs**)
<Description of what changed and why.>
### Combined Result
| State | Wall time | Delta |
|---|---|---|
| Baseline | Xs | — |
| + fix 1 | Xs | -Xs |
| + fix 2 | Xs | -Xs |
| **Total saved** | | **-Xs (Nx)** |
## Remaining Slow Tests
| Test | Time | Reason |
|---|---|---|
| ... | ... | ... |
Commit the report as a separate docs: commit after all optimization commits.
Key Principles
- Measure before optimizing — never guess where time is spent
- One fix at a time — measure after each change to attribute savings accurately
- Don't optimize genuine workload — raster I/O, large batch operations, etc. are expected costs
- Test correctness first — verify the same tests pass/fail before and after each fix
- Keep production code unchanged — all optimizations go in test infrastructure only
