Role

Debug Agent

Trigger

User invokes DEBUG workflow.
Router routes DEBUG intent to bug-investigator.

Description

The Bug-Investigator skill provides systematic debugging capabilities following the LOG FIRST approach. It gathers evidence, identifies root causes, and creates reproduction steps before any fix is attempted.

Agent Capabilities

Evidence Collection: Gather logs, errors, stack traces systematically
Root Cause Analysis: Identify underlying causes using evidence
Reproduction Steps: Create minimal steps to reproduce the issue
Regression Test Guidance: Help write tests that prevent future occurrences
Variant Coverage: Ensure edge cases and non-default scenarios are considered

Inputs

User description of the issue
Logs, error messages, stack traces
Existing memory (patterns.md for similar issues)
Git context (recent changes)
Code files related to the issue

Outputs

Evidence summary
Root cause hypothesis
Reproduction steps
Regression test requirements
Router contract (YAML)

Steps

1. LOAD Memory

Read .opencode/context/01_memory/patterns.md for similar issues.

patterns = load_memory_file("patterns.md")
similar_issues = find_similar_issues(patterns, issue_description)

2. Gather Evidence

Collect all available evidence systematically:

Error messages and exceptions
Stack traces (full traceback)
Log files and outputs
User reproduction steps
Git history (recent changes)
Test failures

Evidence Collection Checklist:

3. Analyze Evidence

Group evidence by category:

Environment: OS, version, configuration
Code: Stack trace, line numbers, function calls
Data: Input values, state at failure
Timing: When failure occurred, sequence of events

Look for patterns:

Repeated errors
Common code paths
Environmental factors
Recent changes that may have introduced the issue

4. Formulate Root Cause Hypothesis

Create hypothesis based on evidence (NOT speculation):

Root Cause: [Specific cause]
Evidence: [List of supporting evidence]
Confidence: [HIGH/MEDIUM/LOW]

Criteria for Root Cause:

Must be supported by evidence
Must explain all symptoms
Must be actionable (can be fixed)

5. Create Reproduction Steps

Write minimal steps to reproduce the issue:

1. [Step 1]
2. [Step 2]
...
N. [Observe failure]

Reproduction Requirements:

Minimal (no extra steps)
Reproducible (works consistently)
Verifiable (can confirm fix)

6. Define Regression Test Requirements

Specify what tests should prevent this issue:

Test type (unit, integration, e2e)
Input scenarios
Expected behavior

7. Output Contract

Generate YAML contract with investigation results.

Output Contract Template

router_contract:
  status: COMPLETE
  workflow: DEBUG
  phase: INVESTIGATION
  evidence_collected:
    - "Error: Connection refused in auth.py:42"
    - "Stack trace: TimeoutException at network.py:100"
  root_cause: "Network timeout due to missing retry logic"
  confidence: HIGH
  reproduction_steps:
    - "1. Start the server"
    - "2. Send 100 concurrent requests"
    - "3. Observe connection refused errors"
  regression_test_requirements:
    type: "integration"
    scenarios:
      - "Concurrent requests exceed pool size"
      - "Network timeout under load"
    expected_behavior: "Graceful degradation with retry"
  similar_issues_in_memory:
    - "Issue: Auth timeout in PR #123"
    - "Fix: Added exponential backoff"
  next_phase: FIX

LOG FIRST Protocol

The LOG FIRST protocol is mandatory for all investigations:

List all evidence (errors, logs, traces)
Organize evidence by category
Group related evidence
Formulate hypothesis from evidence
Identify root cause
Reproduce the issue
Specify fix requirements
Test the hypothesis

Anti-Pattern (NOT ALLOWED):

Guessing the cause before gathering evidence
Implementing fixes without reproduction steps
Skipping evidence collection

Variant Coverage Requirements

Ensure investigation considers:

Empty/null values: What happens with empty input?
Boundary conditions: Edge cases and limits
Concurrency: Race conditions, thread safety
Network issues: Timeouts, partitions, retries
Error states: Failure modes and recovery
Load conditions: Performance under stress

Memory Integration

Before Investigation:

patterns = read_file(".opencode/context/01_memory/patterns.md")
similar_issues = extract_gotchas(patterns)

After Investigation (to be done in Closure):

new_gotcha = """
## <Date>: <Issue Title>
- **Issue:** <Description>
- **Root Cause:** <Root cause>
- **Fix:** <How it was fixed>
- **Reference:** <Commit/PR>
"""
append_to_patterns(new_gotcha)

Examples

Example 1: Network Timeout Issue

User Report: "Login fails under load with timeout errors"

Investigation:

Evidence:
- Error: "Connection refused" in auth.py:42
- Stack trace shows TimeoutException
- Logs show pool exhaustion at 100 connections
Root cause: Connection pool size too small for concurrent users
Reproduction: Send 100+ concurrent login requests
Regression test: Test login with 150 concurrent requests

Example 2: Data Corruption Issue

User Report: "User profiles showing wrong data"

Investigation:

Evidence:
- Error: None (silent failure)
- Logs show race condition in user_store.py:88
- Recent commit #456 added async code
Root cause: Race condition in async user update
Reproduction: Two concurrent updates to same user
Regression test: Concurrent user profile updates

Error Handling

If evidence collection fails:

Try alternative sources (different logs, instruments)
Ask user for additional information
Document what evidence was missing
Proceed with best hypothesis given available evidence

If root cause cannot be determined:

Document all attempted approaches
Suggest areas for further investigation
Propose potential causes with confidence levels

Confidence Scoring

Confidence in root cause hypothesis:

Level	Criteria
HIGH	Evidence from multiple sources, reproducible
MEDIUM	Evidence from one source, logical inference
LOW	Speculation without direct evidence

Best Practices

Always gather evidence first - Never hypothesize without data
Document everything - Evidence, hypotheses, attempts
Reproduce before fixing - Verify you understand the problem
Write regression tests - Prevent the issue from recurring
Consider variants - Edge cases and non-default scenarios
Update memory - Share findings with future investigators

Anti-Patterns

❌ Guessing the cause before looking at evidence
❌ Implementing fixes without reproduction steps
❌ Skipping regression tests
❌ Ignoring variant coverage
❌ Not checking memory for similar issues
❌ Failing to update patterns.md after fix

bug-investigatorSafety 100Repository

Package Files

Role

Trigger

Description

Agent Capabilities

Inputs

Outputs

Steps

1. LOAD Memory

2. Gather Evidence

3. Analyze Evidence

4. Formulate Root Cause Hypothesis

5. Create Reproduction Steps

6. Define Regression Test Requirements

7. Output Contract

Output Contract Template

LOG FIRST Protocol

Variant Coverage Requirements

Memory Integration

Examples

Example 1: Network Timeout Issue

Example 2: Data Corruption Issue

Error Handling

Confidence Scoring

Best Practices

Anti-Patterns

Install

AI Quality Score

Metadata

Tags

bug-investigatorSafety 100Repository ShareFavorite skill

Package Files

Role

Trigger

Description

Agent Capabilities

Inputs

Outputs

Steps

1. LOAD Memory

2. Gather Evidence

3. Analyze Evidence

4. Formulate Root Cause Hypothesis

5. Create Reproduction Steps

6. Define Regression Test Requirements

7. Output Contract

Output Contract Template

LOG FIRST Protocol

Variant Coverage Requirements

Memory Integration

Examples

Example 1: Network Timeout Issue

Example 2: Data Corruption Issue

Error Handling

Confidence Scoring

Best Practices

Anti-Patterns

Install

AI Quality Score

Metadata

Tags

bug-investigatorSafety 100Repository