Test-Driven Development

Overview

Write the test first. Watch it fail. Write minimal code to pass. Refactor. Commit.

Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.

Announce at start: "I'm using gambit:test-driven-development to implement this with RED-GREEN-REFACTOR."

Rigidity Level

LOW FREEDOM — Follow these exact steps in order. Do not adapt, skip, or reorder.

Violating the letter of the rules is violating the spirit of the rules.

Quick Reference

Phase	Action	Expected Result
RED	Write one failing test	Test FAILS with expected message
Verify RED	Run test, read failure	Fails because feature missing (not typo)
GREEN	Write minimal code	Test passes
Verify GREEN	Run ALL tests	All green, no regressions
REFACTOR	Clean up while green	Tests still pass
COMMIT	Commit the increment	Behavior captured

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Wrote code before the test? Delete it. Start over.

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

Fresh implementation from tests. No exceptions.

When to Use

Always when writing production code:

New features or functions
Bug fixes (test reproduces the bug first)
Behavior changes during refactoring
Any code that should have test coverage

Exceptions (confirm with your human partner):

Throwaway prototypes (will be deleted)
Generated code (output, not generators)
Configuration files

TDD vs Executing-Plans

These skills are complementary, not competing:

	TDD	Executing-Plans
Scope	Single function or behavior	Multi-task epic
Controls	HOW code is written	WHICH task, WHEN to stop
Cycle	RED → GREEN → REFACTOR → COMMIT	Load → Execute → Checkpoint → STOP
Checkpoints	After each test passes	After each task completes
Relationship	Called BY executing-plans	Calls TDD during execution

TDD is a coding discipline used WITHIN task execution. Executing-plans decides WHAT to build; TDD decides HOW to build it.

The Process

1. RED — Write One Failing Test

Write a single test for one behavior. Not two. Not "and."

Requirements:

Test ONE behavior ("and" in name? Split it)
Name describes the behavior: test_rejects_empty_email, not test_email
Use real objects, not mocks (unless external dependency)
Assert on behavior, not implementation details

Run the test. Read the failure message.

Confirm ALL THREE:

Test fails (not errors with syntax issues)
Failure message matches expectation ("function not found" or assertion mismatch)
Fails because feature is MISSING, not because of typos

If test passes immediately: Your test is wrong. It tests existing behavior, not new behavior. Fix the test before continuing.

If test has syntax errors: Fix syntax, re-run until you get a proper assertion failure.

2. GREEN — Write Minimal Code

Write the simplest code that makes the test pass. Nothing more.

Minimal means:

No features the test doesn't exercise
No error handling the test doesn't check
No type annotations the test doesn't require
No docstrings or comments
Hardcoded values are fine if only one test exercises the case

Run ALL tests (not just the new one). Confirm:

New test passes
All existing tests still pass
No warnings or errors

If new test fails: Fix code, not test. If other tests break: Fix regressions NOW, before continuing.

3. REFACTOR — Clean Up While Green

Only after ALL tests pass:

Remove duplication
Improve names
Extract helpers if repeated
Simplify logic

Run tests after each refactoring change. If tests break, undo the refactoring step.

Do NOT add behavior during refactoring. No new features, no new edge cases. Those are new RED cycles.

4. COMMIT — Capture the Increment

git add [test file] [implementation file]
git commit -m "feat(module): [behavior description]"

Commit message describes the behavior, not the test.

5. REPEAT

Next behavior → next failing test → next minimal implementation.

Each RED-GREEN-REFACTOR-COMMIT cycle should be small — one behavior at a time.

Bug Fix Workflow

Bug fixes follow TDD too. The test reproduces the bug.

RED: Write test that exercises the buggy input/behavior
Verify RED: Run test — confirms the bug exists (test fails)
GREEN: Fix the bug with minimal change
Verify GREEN: Run ALL tests — bug fixed, no regressions
COMMIT: The regression test permanently guards against this bug

Example: Feature with TDD

Task: Add email validation that rejects empty strings and missing @ symbols.

RED:

def test_rejects_empty_email():
    result = validate_email("")
    assert result is False

Verify RED:

$ pytest -k test_rejects_empty_email
FAILED: NameError: name 'validate_email' is not defined

Good — fails because function doesn't exist.

GREEN:

def validate_email(email):
    if not email:
        return False
    return True

Minimal. Only handles the case the test checks.

Verify GREEN:

$ pytest
4 passed

Next RED (new behavior):

def test_rejects_missing_at_symbol():
    result = validate_email("userexample.com")
    assert result is False

Verify RED → GREEN → Verify GREEN → REFACTOR → COMMIT → REPEAT.

Testing Anti-Patterns

Never Test Mock Behavior

BAD:  assert mock.was_called()        # Tests plumbing, not behavior
GOOD: assert result == expected_value  # Tests actual output

Never Add Test-Only Code to Production

BAD:  def reset(self): ...  # Only used in tests, dangerous in prod
GOOD: Test utilities handle setup/teardown externally

Never Mock Without Understanding

Before mocking any dependency:

What side effects does the real code have?
Does this test depend on those side effects?
If yes → mock at a lower level, not this method

Red Flags — STOP and Start Over

Wrote code before test
Test passes immediately
Can't explain why test failed
Adding features during GREEN beyond what test requires
Tests added "later"
Rationalizing "just this once"
"I already manually tested it"
"Keep as reference" or "adapt existing code"
"Already spent X hours, deleting is wasteful"
"This is different because..."

All of these mean: Delete code. Start over with RED.

Common Rationalizations

Excuse	Reality
"Too simple to test"	Simple code breaks. Test takes 30 seconds.
"I'll test after"	Tests passing immediately prove nothing.
"Already manually tested"	Ad-hoc ≠ systematic. Can't re-run.
"Deleting X hours is wasteful"	Sunk cost. Unverified code is debt.
"Need to explore first"	Fine. Throw away exploration, start with RED.
"Test is hard to write"	Hard to test = hard to use. Simplify the design.
"TDD slows me down"	TDD is faster than debugging in production.
"This is different because..."	It's not. RED-GREEN-REFACTOR.

Language-Specific Commands

See REFERENCE.md for test runner commands by language (Go, TypeScript, Rust, Python, Ruby, etc.).

Verification Checklist

Before marking work complete:

Every new function/method has a test
Watched each test fail before implementing
Each test failed for expected reason (feature missing, not typo)
Wrote minimal code to pass each test (no extras)
All tests pass with no warnings
Tests use real code (mocks only if unavoidable)
Edge cases covered as separate RED-GREEN cycles
Changes committed after each green cycle

Can't check all boxes? You skipped TDD. Start over.

Integration

Called by:

gambit:executing-plans (when implementing task steps)
gambit:debugging (write failing test reproducing bug)

Calls:

gambit:verification (running tests to verify RED and GREEN)

Workflow:

RED (write test) → Verify fail → GREEN (minimal code) → Verify pass → REFACTOR → COMMIT → REPEAT

test-driven-developmentSafety 95Repository

Package Files

Test-Driven Development

Overview

Rigidity Level

Quick Reference

The Iron Law

When to Use

TDD vs Executing-Plans

The Process

1. RED — Write One Failing Test

2. GREEN — Write Minimal Code

3. REFACTOR — Clean Up While Green

4. COMMIT — Capture the Increment

5. REPEAT

Bug Fix Workflow

Example: Feature with TDD

Testing Anti-Patterns

Never Test Mock Behavior

Never Add Test-Only Code to Production

Never Mock Without Understanding

Red Flags — STOP and Start Over

Common Rationalizations

Language-Specific Commands

Verification Checklist

Integration

Install

AI Quality Score

Metadata

Tags

test-driven-developmentSafety 95Repository ShareFavorite skill

Package Files

Test-Driven Development

Overview

Rigidity Level

Quick Reference

The Iron Law

When to Use

TDD vs Executing-Plans

The Process

1. RED — Write One Failing Test

2. GREEN — Write Minimal Code

3. REFACTOR — Clean Up While Green

4. COMMIT — Capture the Increment

5. REPEAT

Bug Fix Workflow

Example: Feature with TDD

Testing Anti-Patterns

Never Test Mock Behavior

Never Add Test-Only Code to Production

Never Mock Without Understanding

Red Flags — STOP and Start Over

Common Rationalizations

Language-Specific Commands

Verification Checklist

Integration

Install

AI Quality Score

Metadata

Tags

test-driven-developmentSafety 95Repository