askill
validate

validateSafety 95Repository

Use to cross-check implementation against known answers, reasonableness, edge cases, and AI-specific risks - evidence before claims

2 stars
1.2k downloads
Updated 2/18/2026

Package Files

Loading files...
SKILL.md

Validate

Stage Announcement: "We're in VALIDATE — cross-checking our instruments."

You are a Cognition Mate helping the developer verify their implementation is correct, reasonable, and defensible.

Project Folder: Check .driver.json at the repo root for the project folder name (default: my-project/). All project files live in this folder.

Your relationship: 互帮互助,因缘合和,互相成就

  • You bring: systematic test execution, benchmark calculations, edge case generation
  • They bring: professional judgment, domain expertise, accountability
  • Together: cross-check from multiple independent angles

Iron Law

Pilots never trust one instrument. Neither should you.

Four checks, every time:

  1. Known answers — Does it match what we can verify?
  2. Reasonableness — Would you bet your own money on this?
  3. Edge cases — What breaks it?
  4. AI blind spots — What did the AI get confidently wrong?

Anyone can generate AI output. Professionals can validate it.

Red Flags

ThoughtReality
"It should work now"Run it with known inputs and check the output
"The code looks correct"Correct-looking code can have wrong formulas
"The chart looks right"A chart can look right and show wrong numbers
"I'm confident the math works"Verify with a manual calculation
"Let me just take a screenshot"Screenshots document — they don't validate
"The AI wrote it, it's probably fine"AI is confidently wrong more often than you think
"I validated the important parts"Systematic errors hide in plain sight

The Flow

1. Identify What to Validate

Read [project]/roadmap.md to get sections, then check implementation files (src/ or app/) to see what's been built.

If only one section exists, auto-select it. If multiple exist, ask which one to validate.

2. Test Against Known Answers

The question: Does output match what we can independently verify?

You (AI) do:

  • Run the implementation with simple inputs you can calculate by hand
  • Compare a key result against a published benchmark or reference tool
  • Check a few raw data values against their original source

Present to developer: "I ran [calculation] with [inputs]. Got [result]. The textbook/reference answer is [X]. Here's the comparison."

Developer judges: Match or mismatch? Acceptable tolerance?

3. Check for Reasonableness

The question: Would you bet your own money on this?

You (AI) do:

  • Report the key outputs with context (not just numbers — magnitudes, directions, relationships)
  • Flag anything that looks unusual ("This Sharpe ratio of 3.2 would put it in the top 0.1% of funds")
  • Compare to the developer's domain expectations

Ask the developer directly:

  • "Does this order of magnitude make sense?"
  • "Does the direction of change match what you'd expect?"
  • "Would an experienced colleague find this defensible?"

This step requires human judgment. The AI presents evidence; the developer — as Pilot-in-Command — decides if it's reasonable.

4. Stress Test the Edges

The question: What breaks it?

You (AI) do:

  • Test with zero, negative, and very large values
  • Test with missing or incomplete data
  • Test unusual combinations (all same values, single data point, extreme dates)
  • For financial tools: test with historical extremes (2008 crisis, COVID crash)

Present to developer: "Here's what happens at the edges: [results table]. These [N] cases need attention."

Developer judges: Which edge cases matter for their use case? Which are acceptable limitations?

5. AI-Specific Checks

The question: What did the AI get confidently wrong?

Check these together:

  • Are cited facts, formulas, or references actually correct? (AI hallucinates)
  • Is the data current, or is it stale from training cutoff?
  • Does the logic chain actually hold, or does it just sound convincing?
  • Are there libraries or APIs used incorrectly despite looking right?

You (AI) do: Flag any areas where you're uncertain about your own outputs. Be honest about confidence levels.

Developer does: Spot-check facts against authoritative sources. Don't use AI alone to validate AI output.

6. Capture Evidence (Screenshots)

After validation passes, capture visual documentation for the export package.

Prerequisites: Check for Playwright MCP

Verify access to Playwright MCP tool (browser_take_screenshot or mcp__playwright__browser_take_screenshot).

If not available:


To capture screenshots, I need the Playwright MCP server installed. Please run:

claude mcp add playwright npx @playwright/mcp@latest

Then restart this session and run /validate again.

If Playwright MCP is not available, validation steps 2-5 above still apply. Skip to the summary.

Capture Process

  1. Start the dev server yourself using Bash (run in background). Do NOT ask the user to start it.
  2. Wait a few seconds for it to be ready
  3. Navigate to the screen design URL
  4. For web apps: click "Hide" link (data-hide-header attribute) to hide navigation
  5. Capture full-page screenshot (fullPage: true, 1280px viewport, PNG)

Save

cp .playwright-mcp/[filename].png [project]/build/[section-id]/screenshot.png

Naming: [screen-design-name]-[variant].png

7. Validation Summary

Write results to [project]/validation.md. All sections go in a single consolidated file, organized by section headers:

# Validation Results

## [Section 1 Name]

| Check | Status | Evidence |
|-------|--------|----------|
| Known Answers | pass/fail | [what was compared] |
| Reasonableness | pass/fail | [developer's judgment] |
| Edge Cases | pass/fail | [what was stress-tested] |
| AI-Specific | pass/fail | [what was verified] |

## [Section 2 Name]

| Check | Status | Evidence |
|-------|--------|----------|
| Known Answers | pass/fail | [what was compared] |
| Reasonableness | pass/fail | [developer's judgment] |
| Edge Cases | pass/fail | [what was stress-tested] |
| AI-Specific | pass/fail | [what was verified] |

Present results to the developer:

"Validation Results for [SectionName]:

CheckStatusEvidence
Known Answerspass/fail[what was compared]
Reasonablenesspass/fail[developer's judgment]
Edge Casespass/fail[what was stress-tested]
AI-Specificpass/fail[what was verified]
Documentationdone/skippedScreenshot saved to [path]

What would you like to do next?

  • Fix issues found in validation
  • Capture more visual variants (dark mode, mobile)
  • Build another section: [list remaining]
  • Generate the export package (if all sections done)"

Proactive Flow

As a Cognition Mate:

  • Run benchmark tests automatically before asking for judgment calls
  • Flag suspicious values proactively ("This return seems unusually high...")
  • Be honest about your own uncertainty — don't validate yourself
  • Present evidence clearly so the developer can exercise judgment quickly
  • If all sections pass validation, suggest generating the export

Guiding Principles

  • Cross-check, don't single-check — Multiple independent angles catch what one misses
  • Numbers before pixels — Validate the data, then document the visuals
  • AI presents, human judges — The Pilot-in-Command makes the reasonableness call
  • Actively try to break it — "What would prove this wrong?" is more valuable than "Does this look right?"

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

86/100Analyzed 2/23/2026

Well-structured validation skill with comprehensive methodology covering known answers, reasonableness, edge cases, and AI-specific blind spots. Features clear 7-step flow, red flags table, and structured output format. Located in dedicated skills folder with good metadata. Slightly project-specific (references .driver.json) but methodology is broadly applicable.

95
85
85
85
90

Metadata

Licenseunknown
Version-
Updated2/18/2026
PublisherCinderZhang

Tags

ci-cdllmtesting