Debug Skill
Structured debugging with expert domain knowledge. Systematic, not random.
Usage
/debug <description or error>
Philosophy
SYSTEMATIC DEBUGGING
- Never guess randomly
- Reproduce before fixing
- One hypothesis at a time
- Prove the fix with a test
Workflow
┌─────────────────────────────────────┐
│ Step 1: Reproduce │
│ (Confirm error is reproducible) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 2: Isolate │
│ (Smallest failing case) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: Trace │
│ (Read traceback bottom-to-top) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 4: Hypothesize │
│ (Form theory, test minimally) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 5: Fix │
│ (Apply fix with regression test) │
└─────────────────┬───────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 6: Prevent │
│ (Add guards, improve errors) │
└─────────────────────────────────────┘
Step 1: Reproduce
CRITICAL: Do not proceed until you can reproduce the error.
Questions to Ask
1. What are the exact steps to reproduce?
2. What environment (Python version, OS, dependencies)?
3. What input data causes the error?
4. Is it consistent or intermittent?
5. When did it start happening?
Reproduction Script
# tmp/reproduce_bug.py
"""
Reproduction script for: [bug description]
Expected: [what should happen]
Actual: [what happens instead]
"""
# Minimal setup
from src.module import function_that_fails
# Exact input that causes error
input_data = {...}
# Trigger the bug
result = function_that_fails(input_data)
Verify Reproduction
python tmp/reproduce_bug.py
# Should see the error
Step 2: Isolate
Find the smallest case that fails:
Bisection
Full input fails → Half input fails? → Quarter input fails?
Continue until minimal failing case found
Isolation Questions
1. Which specific input field causes it?
2. Which line of code is the entry point?
3. Can we trigger it with a unit test?
Minimal Failing Test
def test_minimal_reproduction():
"""Minimal case that reproduces the bug."""
# This should fail until bug is fixed
result = function(minimal_input)
assert result == expected
Step 3: Trace
Read the traceback systematically:
Bottom-to-Top Reading
Traceback (most recent call last):
File "main.py", line 10, in <module> ← Start of call chain
process_data(data)
File "processor.py", line 25, in process_data
result = transform(data)
File "transform.py", line 42, in transform
return data["missing_key"] ← ACTUAL ERROR LOCATION
KeyError: 'missing_key' ← ERROR TYPE AND MESSAGE
Key Questions
1. What is the exact exception type?
2. What is the failing line?
3. What are the variable values at that point?
4. What called this function?
5. What data was passed?
Add Debugging Output
# Temporary debugging (remove after fix)
import json
print(f"DEBUG: data = {json.dumps(data, indent=2, default=str)}")
Step 4: Hypothesize
Form a theory and test it minimally:
Hypothesis Template
IF [condition] THEN [error occurs] BECAUSE [mechanism]
Example:
IF data["key"] is missing THEN KeyError BECAUSE API changed response format
Test Hypothesis
# Quick test of hypothesis
if "key" in data:
print("Key exists - hypothesis wrong")
else:
print("Key missing - hypothesis confirmed")
Common Bug Patterns
Null/None:
# Check for None before access
if obj is None:
# handle None case
Type Errors:
# Check types
print(f"Type of data: {type(data)}")
# Expected: dict, got: list
Async Issues:
# Race conditions, order issues
# Add logging with timestamps
import time
print(f"{time.time()}: Starting operation")
State Issues:
# Shared state modified unexpectedly
# Add state logging
print(f"State before: {state}")
# ... operation ...
print(f"State after: {state}")
Step 5: Fix
Apply the fix with a test that proves it works:
TDD the Fix
# 1. Write test that fails
def test_handles_missing_key():
data = {"other_key": "value"} # missing expected key
result = function(data)
assert result == default_value # or appropriate handling
# 2. Run test - should fail
# pytest tests/test_fix.py -v
# 3. Apply fix
def function(data):
return data.get("key", default_value)
# 4. Run test - should pass
# pytest tests/test_fix.py -v
Fix Principles
- Minimal change to fix the issue
- Don't refactor while fixing
- Match existing code style
- Add error handling, not just fix symptoms
Step 6: Prevent
Make this bug impossible or obvious in the future:
Regression Test
def test_regression_issue_123():
"""Regression test for issue #123 - KeyError on missing field."""
# This test ensures the bug doesn't return
data = {"incomplete": "data"}
result = function(data)
assert result is not None
Improve Error Messages
# Instead of KeyError, provide context
if "required_field" not in data:
raise ValueError(
f"Missing 'required_field' in data. "
f"Got keys: {list(data.keys())}"
)
Add Type Hints
def function(data: dict[str, Any]) -> Result:
"""Process data with required fields."""
...
Add Validation
from pydantic import BaseModel
class InputData(BaseModel):
required_field: str
optional_field: str = "default"
def function(data: dict) -> Result:
validated = InputData(**data) # Fails fast with clear error
...
Expert Mode Integration
Domain-specific debugging patterns are auto-injected:
Voice/Telephony (call-screener)
- Audio buffer underruns
- WebSocket disconnections
- TTS timing issues
- Async event ordering
Diagnostics (Tesla/autodiag)
- Data freshness issues
- Alert correlation problems
- Rule logic errors
- ETL pipeline failures
Multi-LLM (llm-council)
- Response format mismatches
- Rate limiting
- Timeout handling
- Consensus algorithm bugs
Debugging Tools
Python Debugger
# Add breakpoint
import pdb; pdb.set_trace()
# Or in Python 3.7+
breakpoint()
Logging
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.debug(f"Processing: {data}")
Print Debugging (temporary)
# Use tmp/ folder for debug output
from pathlib import Path
Path("tmp/debug.log").write_text(str(data))
Example
/debug "TypeError: 'NoneType' object is not subscriptable in process_user"
Step 1 - Reproduce:
→ Run process_user(user_id=123)
→ Error confirmed: TypeError on line 42
Step 2 - Isolate:
→ Error occurs when user has no profile
→ Minimal case: user with profile=None
Step 3 - Trace:
→ Line 42: return user.profile["name"]
→ user.profile is None
→ None["name"] causes TypeError
Step 4 - Hypothesize:
→ IF user.profile is None THEN error occurs
→ Test: check database - user 123 has profile=NULL
Step 5 - Fix:
→ Write test for user without profile
→ Fix: if user.profile: return user.profile["name"]
→ Test passes
Step 6 - Prevent:
→ Add regression test
→ Improve error: "User 123 has no profile"
→ Add type hint: Optional[Profile]
