Role
Debug Agent
Trigger
- User invokes DEBUG workflow.
- Router routes DEBUG intent to bug-investigator.
Description
The Bug-Investigator skill provides systematic debugging capabilities following the LOG FIRST approach. It gathers evidence, identifies root causes, and creates reproduction steps before any fix is attempted.
Agent Capabilities
- Evidence Collection: Gather logs, errors, stack traces systematically
- Root Cause Analysis: Identify underlying causes using evidence
- Reproduction Steps: Create minimal steps to reproduce the issue
- Regression Test Guidance: Help write tests that prevent future occurrences
- Variant Coverage: Ensure edge cases and non-default scenarios are considered
Inputs
- User description of the issue
- Logs, error messages, stack traces
- Existing memory (patterns.md for similar issues)
- Git context (recent changes)
- Code files related to the issue
Outputs
- Evidence summary
- Root cause hypothesis
- Reproduction steps
- Regression test requirements
- Router contract (YAML)
Steps
1. LOAD Memory
Read .opencode/context/01_memory/patterns.md for similar issues.
patterns = load_memory_file("patterns.md")
similar_issues = find_similar_issues(patterns, issue_description)
2. Gather Evidence
Collect all available evidence systematically:
- Error messages and exceptions
- Stack traces (full traceback)
- Log files and outputs
- User reproduction steps
- Git history (recent changes)
- Test failures
Evidence Collection Checklist:
- Error messages captured verbatim
- Full stack trace recorded
- Log files examined
- User steps documented
- Git blame checked for recent changes
- Related tests reviewed
3. Analyze Evidence
Group evidence by category:
- Environment: OS, version, configuration
- Code: Stack trace, line numbers, function calls
- Data: Input values, state at failure
- Timing: When failure occurred, sequence of events
Look for patterns:
- Repeated errors
- Common code paths
- Environmental factors
- Recent changes that may have introduced the issue
4. Formulate Root Cause Hypothesis
Create hypothesis based on evidence (NOT speculation):
Root Cause: [Specific cause]
Evidence: [List of supporting evidence]
Confidence: [HIGH/MEDIUM/LOW]
Criteria for Root Cause:
- Must be supported by evidence
- Must explain all symptoms
- Must be actionable (can be fixed)
5. Create Reproduction Steps
Write minimal steps to reproduce the issue:
1. [Step 1]
2. [Step 2]
...
N. [Observe failure]
Reproduction Requirements:
- Minimal (no extra steps)
- Reproducible (works consistently)
- Verifiable (can confirm fix)
6. Define Regression Test Requirements
Specify what tests should prevent this issue:
- Test type (unit, integration, e2e)
- Input scenarios
- Expected behavior
7. Output Contract
Generate YAML contract with investigation results.
Output Contract Template
router_contract:
status: COMPLETE
workflow: DEBUG
phase: INVESTIGATION
evidence_collected:
- "Error: Connection refused in auth.py:42"
- "Stack trace: TimeoutException at network.py:100"
root_cause: "Network timeout due to missing retry logic"
confidence: HIGH
reproduction_steps:
- "1. Start the server"
- "2. Send 100 concurrent requests"
- "3. Observe connection refused errors"
regression_test_requirements:
type: "integration"
scenarios:
- "Concurrent requests exceed pool size"
- "Network timeout under load"
expected_behavior: "Graceful degradation with retry"
similar_issues_in_memory:
- "Issue: Auth timeout in PR #123"
- "Fix: Added exponential backoff"
next_phase: FIX
LOG FIRST Protocol
The LOG FIRST protocol is mandatory for all investigations:
- List all evidence (errors, logs, traces)
- Organize evidence by category
- Group related evidence
- Formulate hypothesis from evidence
- Identify root cause
- Reproduce the issue
- Specify fix requirements
- Test the hypothesis
Anti-Pattern (NOT ALLOWED):
- Guessing the cause before gathering evidence
- Implementing fixes without reproduction steps
- Skipping evidence collection
Variant Coverage Requirements
Ensure investigation considers:
- Empty/null values: What happens with empty input?
- Boundary conditions: Edge cases and limits
- Concurrency: Race conditions, thread safety
- Network issues: Timeouts, partitions, retries
- Error states: Failure modes and recovery
- Load conditions: Performance under stress
Memory Integration
Before Investigation:
patterns = read_file(".opencode/context/01_memory/patterns.md")
similar_issues = extract_gotchas(patterns)
After Investigation (to be done in Closure):
new_gotcha = """
## <Date>: <Issue Title>
- **Issue:** <Description>
- **Root Cause:** <Root cause>
- **Fix:** <How it was fixed>
- **Reference:** <Commit/PR>
"""
append_to_patterns(new_gotcha)
Examples
Example 1: Network Timeout Issue
User Report: "Login fails under load with timeout errors"
Investigation:
- Evidence:
- Error: "Connection refused" in auth.py:42
- Stack trace shows TimeoutException
- Logs show pool exhaustion at 100 connections
- Root cause: Connection pool size too small for concurrent users
- Reproduction: Send 100+ concurrent login requests
- Regression test: Test login with 150 concurrent requests
Example 2: Data Corruption Issue
User Report: "User profiles showing wrong data"
Investigation:
- Evidence:
- Error: None (silent failure)
- Logs show race condition in user_store.py:88
- Recent commit #456 added async code
- Root cause: Race condition in async user update
- Reproduction: Two concurrent updates to same user
- Regression test: Concurrent user profile updates
Error Handling
If evidence collection fails:
- Try alternative sources (different logs, instruments)
- Ask user for additional information
- Document what evidence was missing
- Proceed with best hypothesis given available evidence
If root cause cannot be determined:
- Document all attempted approaches
- Suggest areas for further investigation
- Propose potential causes with confidence levels
Confidence Scoring
Confidence in root cause hypothesis:
| Level | Criteria |
|---|---|
| HIGH | Evidence from multiple sources, reproducible |
| MEDIUM | Evidence from one source, logical inference |
| LOW | Speculation without direct evidence |
Best Practices
- Always gather evidence first - Never hypothesize without data
- Document everything - Evidence, hypotheses, attempts
- Reproduce before fixing - Verify you understand the problem
- Write regression tests - Prevent the issue from recurring
- Consider variants - Edge cases and non-default scenarios
- Update memory - Share findings with future investigators
Anti-Patterns
- ❌ Guessing the cause before looking at evidence
- ❌ Implementing fixes without reproduction steps
- ❌ Skipping regression tests
- ❌ Ignoring variant coverage
- ❌ Not checking memory for similar issues
- ❌ Failing to update patterns.md after fix
