Systematic Debugging
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: Find root cause before attempting fixes. Symptom fixes are failure.
The Iron Law
Investigate root cause before attempting any fix. If you haven't completed Phase 1, you cannot propose fixes.
When to Use
Any technical issue:
- Test failures
- Bugs in production
- Unexpected behavior
- Performance problems
- Build failures
- Integration issues
Especially when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- You've already tried multiple fixes
- Previous fix didn't work
The Four Phases
Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
Before attempting any fix:
-
Read Error Messages Carefully
- Don't skip past errors or warnings
- Read stack traces completely
- Note line numbers, file paths, error codes
-
Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- If not reproducible → gather more data, don't guess
-
Check Recent Changes
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
-
Gather Evidence in Multi-Component Systems
For EACH component boundary: - Log what data enters - Log what data exits - Verify environment/config propagation Run once to gather evidence showing WHERE it breaks THEN investigate that specific component -
Trace Data Flow
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom
Phase 2: Pattern Analysis
-
Find Working Examples
- Locate similar working code in same codebase
- What works that's similar to what's broken?
-
Compare Against References
- Read reference implementation COMPLETELY
- Don't skim - read every line
-
Identify Differences
- What's different between working and broken?
- List every difference, however small
Phase 3: Hypothesis and Testing
-
Form Single Hypothesis
- "I think X is the root cause because Y"
- Be specific, not vague
-
Test Minimally
- Make the SMALLEST possible change
- One variable at a time
- Don't fix multiple things at once
-
Verify Before Continuing
- Did it work? Yes → Phase 4
- Didn't work? Form NEW hypothesis
- Avoid stacking fixes — each added change obscures which one helped
Phase 4: Implementation
-
Create Failing Test Case
- Simplest possible reproduction
- Create this before fixing — it proves the fix addresses the root cause
-
Implement Single Fix
- Address the root cause
- ONE change at a time
- No "while I'm here" improvements
-
Verify Fix
- Test passes now?
- No other tests broken?
-
Circuit Breaker — 2+ Approaches Failed
If you have attempted 2 or more distinct approaches to the same problem:
Stop and regroup — a third attempt without new information is unlikely to succeed.
Instead:
- Summarize what you tried and why each failed
- Propose 2-3 alternative approaches with tradeoffs
- Ask the user to choose before continuing
Cycling through 3-4 failed approaches wastes entire sessions and erodes confidence in each subsequent fix.
Red Flags - STOP and Return to Phase 1
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- "Here are the main problems: [lists fixes without investigation]"
- Proposing solutions before tracing data flow
- "One more fix attempt" — circuit breaker: stop and regroup after 2 failed approaches
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
Real-World Impact
| Approach | Time to Fix | First-Time Fix Rate | New Bugs |
|---|---|---|---|
| Systematic | 15-30 min | 95% | Near zero |
| Random fixes | 2-3 hours | 40% | Common |
