Planning — Write, Review, Execute

Overview

One skill for the full plan lifecycle: write → review → execute.

Phase 0: Deep Understanding

Before writing any plan, build deep understanding of the goal. Skip this phase only if the user provides a fully specified design doc.

Step 1: Form Initial Understanding

Read relevant code, docs, and recent commits to understand the context. Do NOT ask questions yet — first build your own mental model of:

What the user wants to achieve
What exists today (current state)
What would need to change (gap analysis)

Step 2: Ask Clarifying Questions

Based on your understanding, ask questions one at a time. Each question must:

Eliminate a whole branch of ambiguity (not trivial details)
Build on previous answers (incremental deepening)
Offer multiple-choice options with your recommendation when possible

Dynamic termination: Stop asking when remaining uncertainty won't materially affect the plan. Don't ask for the sake of asking.

Soft cap: Maximum 5 questions. If you still have uncertainty after 5, state your assumptions and proceed.

Step 3: Research (optional)

After questions are answered, judge whether research is needed:

Codebase research: When the task touches existing code you haven't fully explored (e.g., modifying a hook system — read existing hooks first)
Web research: When the task involves external tools, APIs, or best practices you're unsure about (e.g., integrating a new library, adopting an unfamiliar pattern)
Both: When the task combines internal changes with external dependencies (e.g., adding OAuth to an existing auth module)
Skip: When you have sufficient understanding (e.g., renaming a variable, fixing a typo, simple refactors with clear scope)

Research dimension principle: When research IS needed, cover both theoretical foundations (papers, docs, design rationale) AND engineering practice (real implementations, battle-tested patterns, known pitfalls). One without the other leads to either ivory-tower designs or cargo-culted solutions.

This is your judgment call — not every plan needs research.

Step 4: Supplementary Questions (if any)

After research, absorb what you learned. Only ask the user about findings you cannot resolve from research alone — things requiring user decisions or preferences.

If no supplementary questions needed, proceed directly to Phase 1.

Step 5: Design presentation (optional)

When the task involves creative/architectural work (new features, new components, significant behavior changes), present the design before writing the plan:

Break the design into sections of 200-300 words
Ask after each section whether it looks right so far
Cover: architecture, components, data flow, error handling, testing
Write the validated design to docs/designs/YYYY-MM-DD-<topic>-design.md

Skip this step for simple refactors, bug fixes, or tasks with a fully specified design doc.

Transition to Phase 1

After Phase 0 completes, before writing the plan, validate each major design decision with Socratic self-check:

Essence — What is the core problem this decision solves?
Framework — Does the current codebase already solve this? What known patterns apply?
Application — Is this feasible on all target platforms? Does benefit > maintenance cost?

Drop any decision that fails step 2 (already solved) or step 3 (infeasible/not worth it).

Then proceed to Phase 1 with the accumulated understanding.

Error Handling

No relevant code/docs found: Inform the user, ask them to point you to the right area, then continue.
User wants to skip Phase 0: Allowed. User can say "skip questions" or "just write the plan" at any time. State your assumptions and proceed to Phase 1.
Contradictory answers: Surface the contradiction to the user, ask them to clarify which direction to take.
5-question cap reached with critical ambiguity: State remaining assumptions explicitly, proceed to Phase 1. The plan will note these assumptions for reviewer scrutiny.

Phase 1: Writing the Plan

Save to: docs/plans/YYYY-MM-DD-<feature-name>.md

Plan Header (required)

# [Feature Name] Implementation Plan

**Goal:** [One sentence]
**Non-Goals:** [What this plan explicitly does NOT do]
**Architecture:** [2-3 sentences]
**Tech Stack:** [Key technologies]

## Review
<!-- Reviewer writes here -->

Checklist Format (enforced by hook)

Every plan must have a ## Checklist section. Every checklist item MUST include an executable verify command:

- [ ] description | `verify command`

Examples:

- [ ] hook 语法正确 | \bash -n hooks/security/my-hook.sh``
- [ ] config 包含新 hook | \jq '.hooks' .kiro/agents/pilot.json | grep -q my-hook``
- [ ] 外部路径被拦截 | \echo '{"tool_name":"fs_write","tool_input":{"file_path":"/tmp/evil.txt"}}' | bash hooks/security/my-hook.sh 2>&1; test $? -eq 2``

Rules:

verify command must be executable (no "手动测试", no "目视检查")
verify command must return exit 0 on success
Each Task must have at least 1 checklist item
Cover: happy path + edge case + integration (where applicable)
Hook enforces: checking off - [x] requires recent successful execution of the verify command
Regression test rule: If plan Files fields include scripts/ralph_loop.py or scripts/lib/, the checklist MUST include: - [ ] 回归测试通过 | \python3 -m pytest tests/ralph-loop/ -v``

Task Structure (TDD)

Each task follows red-green-refactor:

### Task N: [Component Name]

**Files:**
- Create: `exact/path/to/file.py`
- Modify: `exact/path/to/existing.py`
- Test: `tests/exact/path/to/test.py`

**Step 1: Write failing test**
[Complete test code]

**Step 2: Run test — verify it fails**
Run: `pytest tests/path/test.py::test_name -v`
Expected: FAIL

**Step 3: Write minimal implementation**
[Complete implementation code]

**Step 4: Run test — verify it passes**
Run: `pytest tests/path/test.py::test_name -v`
Expected: PASS

**Step 5: Commit**

Rules: exact file paths, complete code (not "add validation"), exact commands with expected output.

Errors Section (required)

Every plan must have an ## Errors section at the bottom. During execution, log every error encountered:

## Errors

| Error | Task | Attempt | Resolution |
|-------|------|---------|------------|

Rules:

Log immediately when error occurs, don't wait
Include which Task triggered the error
Track attempt number — if same error appears at attempt 3, trigger 3-Strike Protocol (see Phase 2)
This section is append-only during execution — never delete entries
Cap: keep most recent 20 entries; if exceeded, summarize older entries into a single "Earlier errors: N resolved" row

Findings Section (optional)

Plans may include a ## Findings section for persisting research discoveries made during execution:

## Findings

- [discovery with context]

Rules:

Append-only — never rewrite, only add new entries
Use when execution-phase research reveals something relevant to later tasks
Not required for simple plans where no research happens during execution

Phase 1.5: Plan Review

After writing the plan, run multi-perspective plan review before execution.

Angle Pool

Two categories: fixed (every round) and random (sampled each round).

Fixed angles (always included):

Angle	Mission	Output
Goal Alignment	You MUST copy each table below and fill EVERY cell. Do NOT summarize or skip rows. If a table has N tasks, your output must have N rows. Missing rows = review REJECTED. Copy and fill this table for EVERY task:\n\n\| Task # \| Goal phrase served (quote exact words) \| If removed, which Goal phrase loses coverage? \|\n\|--------\|---------------------------------------\|----------------------------------------------\|\n\| 1 \| [quote] \| [answer] \|\n\nThen copy and fill the coverage matrix:\n\n\| Goal phrase (copy from plan header) \| Covered by Task #s \|\n\|-------------------------------------\|-------------------\|\n\| [phrase 1] \| [list] \|\n\nFinally: trace the execution order — does Task N's output feed correctly into Task N+1's input? Findings must cite specific Task numbers and Goal phrases.	Missing Coverage / Unnecessary Tasks / Ordering Issues / Verdict
Verify Correctness	For each checklist verify command, you MUST copy this table and fill in EVERY cell:\n\n\| # \| Verify command \| Confirms what \| Exit code (correct impl) \| Exit code (broken impl) \| Sound? \|\n\|---\|---------------\|---------------\|--------------------------	--------------------------

Random pool (2 sampled per round):

Angle	Mission	Analysis Method	Output
All angles	Before writing any finding, verify it is within the plan's stated Goal and NOT in Non-Goals. Findings outside scope are noise — discard silently.	—	—
Completeness	You MUST copy each table below and fill EVERY cell. Missing rows = review REJECTED.\n\nFor each file in the plan's Files: fields that is MODIFIED (not created), copy and fill:\n\n\| File \| Function/Branch \| Exercised by Task # \| Coverage? \|\n\|------\|----------------\|--------------------\|-----------\|\n\| [path] \| [name] \| [task # or NONE] \| [Y/N] \|\n\nThen for each error path (try/except, if-error-return, signal handler) in modified files:\n\n\| File:line \| Error path \| Exercised by Task # \|\n\|----------\|------------	--------------------\|\n\| [path:line] \| [description] \| [task # or NONE] \|\n\nSCOPE: Only analyze functions/branches in files the plan MODIFIES. Do NOT flag functions in files the plan merely reads.	Source-to-task traceability matrix
Testability	You MUST copy this table and fill EVERY cell. Missing rows = review REJECTED.\n\nFor each Task's test case:\n\n\| Task # \| Assertion (what property) \| Minimal wrong impl that passes \| False negative? \|\n\|--------\|--------------------------	-------------------------------	----------------\|\n\| [N] \| [what is checked] \| [describe wrong impl] \| [Y/N + reason] \|\n\nOnly flag tests where you can construct a concrete wrong implementation that passes. "Might be weak" without a specific wrong impl = not actionable.
Technical Feasibility	For each Task: 1) list external dependencies (libraries, OS features, file system assumptions), 2) check if any dependency has platform/version constraints that conflict with Tech Stack, 3) for subprocess-based tests, verify timeout values are sufficient for the operations described, 4) for tests that run commands in tmp_path or isolated dirs, trace whether the command will behave correctly outside the project root (e.g. pytest rootdir detection, missing config files). Flag only concrete blockers, not theoretical risks.	Dependency + constraint audit	Blockers / Platform Risks / Verdict
Security	For each Task that touches file I/O, subprocess, or signal handling: 1) trace data flow from external input to execution, 2) check for path traversal, command injection, or symlink attacks in test fixtures, 3) verify temp files use secure creation (tmp_path, not hardcoded paths).	Data flow trace per Task	Injection Surfaces / Unsafe Patterns / Verdict
Compatibility & Rollback	For each modified file in the plan: 1) list existing tests that import or call functions in that file, 2) check if the plan's changes could break those existing tests, 3) verify the plan includes running existing tests (not just new ones). Also: can the plan's changes be reverted with a single `git revert`?	Existing-test impact analysis	Breaking Changes / Revert Safety / Verdict
Performance	For each Task involving subprocess or threading: 1) calculate worst-case wall-clock time (timeout × max_iterations × retry count), 2) sum across all Tasks to get total suite time, 3) flag any single test that could exceed 30s without @pytest.mark.slow. Provide concrete numbers, not estimates.	Quantified time budget per Task	Time Budget Table / Slow Test Violations / Verdict
Clarity	For each Task's "What to implement" section: 1) attempt to write the function signature and key assertions from the description alone (without reading source), 2) flag any Task where you cannot determine the exact test structure from the description. A clear plan = an executor agent can implement without reading source first.	Implementability dry-run	Ambiguous Tasks / Missing Specs / Verdict

Angle Selection

Every round: 2 fixed + 2 random = 4 reviewers (one parallel batch, no overflow).

Random selection: sample 2 from the random pool. Repeats across rounds are fine — the same angle reviewing a revised plan catches regressions and verifies fixes.

Dispatch Query Template

Each reviewer query MUST include: Context (Goal, Non-Goals, key design decisions), Mission (angle-specific from table above), files to read, and anti-patterns.

## Context
Goal: [one sentence from plan header]
Non-Goals: [from plan header]
Key design decisions that reviewers might mistake for gaps:
- [decision 1 — what was chosen and what was intentionally excluded]
- [decision 2]

## Your Mission
This is a PLAN REVIEW (Mode 1 in your prompt).
[angle-specific mission from the table above]

## Read These Files
Plan: [path]
Source files referenced in plan: [list — reviewer must read before claiming code behavior]

## Anti-patterns (do NOT do these)
- Do not flag issues outside the stated Goal/Non-Goals
- Do not suggest alternative approaches that are equally valid
- Do not flag missing implementation details that an executor agent can infer
- [plan-specific anti-patterns if any]

## Specific Questions for This Plan
Answer each question with evidence (file:line or shell output). Unanswered = review REJECTED.
1. [risk question identified by main agent]
2. [risk question identified by main agent]

## Source Reading Canary
Answer this BEFORE your analysis. Wrong answer = review REJECTED.
Q: [question only answerable by reading specific source file, e.g. "What is the first line of function X in file Y?"]

## Mandatory Source Reading
Before making ANY claim about code behavior, you MUST:
1. Read the actual source file (use Bash: cat <file>)
2. Cite the specific line number in your finding
3. If you haven't read the file, do NOT speculate — read it first
Findings about code behavior without file:line citations will be discarded.

## Output Requirements
Your last line MUST be exactly one of:
  Verdict: APPROVE
  Verdict: REQUEST CHANGES
Missing verdict = review REJECTED and will be re-dispatched.

Orchestration

Compose the round: Goal Alignment + Verify Correctness + 2 random angles
Dispatch 4 reviewer subagents in ONE use_subagent call. Each reviewer query = review angle mission + plan file path. Reviewer reads the file itself (has read/shell tools). Do NOT paste plan content into query — it bloats payload and breaks 4-way parallelism. Must pass plan file path, not content. Must specify agent_name: "reviewer". Same agent_name can spawn multiple instances in parallel. Include in each query: "Read the source files referenced in the plan before making claims about code behavior."
Reviewers in the same round do NOT see each other's feedback
Collect all verdicts. If ANY reviewer REJECTs → fix issues → next round (re-sample 2 random angles) Verdict enforcement: If a reviewer's output does not end with Verdict: APPROVE or Verdict: REQUEST CHANGES, treat it as malformed → re-dispatch that single angle.
Round 2+ rule: When re-dispatching after fixes, include in each query a "Rejected Findings" section with one-line summaries of findings rejected in previous rounds and why. Reviewers must not re-raise these. Round 2+ reviewer count: Dispatch only 2 reviewers (the 2 fixed angles: Goal Alignment + Verify Correctness). Do NOT sample random angles in Round 2+. Purpose of Round 2+ is to verify fixes, not discover new issues.
Repeat until all APPROVE in a single round, or 3 rounds reached
After 3 rounds: stop and tell user "Plan too complex for automated review. Consider breaking into smaller plans."

Reviewer Calibration

Reviewers should REJECT only for issues that would cause the plan to fail or produce wrong results. Do NOT reject for:

Style preferences or alternative approaches that are equally valid
Theoretical risks that are unlikely in practice
Missing features that are nice-to-have but not required for the plan's stated goal

The bar is "would this plan produce a 90/100 result?" not "is this plan perfect?"

Conflict Resolution

When reviewers give contradictory feedback:

Main agent compares both arguments against the plan's Goal statement (the one-sentence goal in the plan header)
The argument that directly serves the stated Goal wins
Document the conflict, both arguments, and the resolution in the plan's Review section
If both arguments equally serve the goal, ask the user to decide

Resource Constraints

Max parallel subagents per batch: 4 (tool hard limit). Round 1: 4 reviewers. Round 2+: 2 reviewers (fixed angles only).
Reviewer context isolation: Reviewers in the same round do NOT see each other's feedback. Each gets the full plan.
Context size: Review packet = full plan file content (verbatim). Reviewers need complete task details, code blocks, and file paths to avoid false rejections from incomplete information.
Error handling: If a reviewer crashes or returns malformed output, continue with remaining reviewers. If fewer than half of the round's reviewers complete, restart the round. Malformed = missing Mission/Findings/Verdict structure.

Phase 2: Execution

After plan is reviewed and approved, choose execution strategy based on checklist size:

Execution Disciplines

These rules apply regardless of which execution strategy is chosen.

Session Resume Protocol

When starting or resuming execution (including new sessions):

Read the plan's Goal + Architecture + Non-Goals
Run git diff --stat to see what's already changed
Check checklist: which items are [x] done, which [ ] remain
Write a one-line status summary to the plan's ## Findings section

This ensures the agent has full context before making any changes.

Read Before Decide

Before any of these actions, re-read the plan's Goal and Non-Goals:

Changing implementation approach mid-task
Deciding to skip or reorder a task
Encountering a blocker and choosing a workaround
Adding scope not in the original plan

This pushes the original intent back into the attention window, preventing drift after many tool calls.

Periodic Re-orientation

Every 3 completed tasks, re-read the plan's Goal paragraph. No writing needed — purely attention refresh. This counters gradual context decay in long execution sessions.

3-Strike Error Protocol

When an error occurs during execution:

Strike 1 — Diagnose & Fix: Read error carefully, identify root cause, apply targeted fix. Log to ## Errors.

Strike 2 — Alternative Approach: Same error? Try a fundamentally different method. Different tool, different algorithm, different angle. Log to ## Errors.

Strike 3 — Broader Rethink: Question assumptions. Search for solutions. Consider whether the plan itself needs revision. Log to ## Errors.

After 3 strikes: Stop and escalate to user. Explain what was tried, share the specific errors, ask for guidance. Do NOT attempt a 4th time with the same approach.

Rules:

next_action != failed_action — never repeat the exact same failing approach
Each strike must be logged in the plan's ## Errors table with attempt number
Strike count is per-error-type, not global (different errors get their own 3 strikes)

Execution Strategy

Sequential execution: one task at a time, commit after each.

Load plan, identify next unchecked item
Execute task (implement + test + verify)
Check off item, commit
Continue to next. Repeat until done.

Each ralph loop iteration spawns a fresh CLI with clean context. The agent should complete as many tasks as possible per iteration before context fills up.

Phase 3: Completion

After all tasks done:

Run full test suite
Present options: merge locally / create PR / keep branch / discard
Clean up worktree if applicable

When to Stop and Ask

Hit a blocker (missing dependency, unclear instruction)
Verification fails repeatedly
Plan has critical gaps
Don't force through blockers — stop and ask.

planningSafety 92Repository

Package Files

Planning — Write, Review, Execute

Overview

Phase 0: Deep Understanding

Step 1: Form Initial Understanding

Step 2: Ask Clarifying Questions

Step 3: Research (optional)

Step 4: Supplementary Questions (if any)

Step 5: Design presentation (optional)

Transition to Phase 1

Error Handling

Phase 1: Writing the Plan

Plan Header (required)

Checklist Format (enforced by hook)

Task Structure (TDD)

Errors Section (required)

Findings Section (optional)

Phase 1.5: Plan Review

Angle Pool

Angle Selection

Dispatch Query Template

Orchestration

Reviewer Calibration

Conflict Resolution

Resource Constraints

Phase 2: Execution

Execution Disciplines

Session Resume Protocol

Read Before Decide

Periodic Re-orientation

3-Strike Error Protocol

Execution Strategy

Phase 3: Completion

When to Stop and Ask

Install

AI Quality Score

Metadata

Tags

planningSafety 92Repository ShareFavorite skill

Package Files

Planning — Write, Review, Execute

Overview

Phase 0: Deep Understanding

Step 1: Form Initial Understanding

Step 2: Ask Clarifying Questions

Step 3: Research (optional)

Step 4: Supplementary Questions (if any)

Step 5: Design presentation (optional)

Transition to Phase 1

Error Handling

Phase 1: Writing the Plan

Plan Header (required)

Checklist Format (enforced by hook)

Task Structure (TDD)

Errors Section (required)

Findings Section (optional)

Phase 1.5: Plan Review

Angle Pool

Angle Selection

Dispatch Query Template

Orchestration

Reviewer Calibration

Conflict Resolution

Resource Constraints

Phase 2: Execution

Execution Disciplines

Session Resume Protocol

Read Before Decide

Periodic Re-orientation

3-Strike Error Protocol

Execution Strategy

Phase 3: Completion

When to Stop and Ask

Install

AI Quality Score

Metadata

Tags

planningSafety 92Repository