askill
skill-eval

skill-evalSafety 95Repository

Evaluate all workspace-hub skills for structural validity, content quality, cross-reference integrity, and registry consistency. Runs 18 checks across critical, warning, and info severity levels with actionable fix suggestions.

1 stars
1.2k downloads
Updated 2/15/2026

Package Files

Loading files...
SKILL.md

Skill Eval

Quick Start

# Full evaluation of all skills
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py

# JSON output
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --format json

# Single skill
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --skill testing-tdd-london

# Only critical issues
uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --severity critical

When to Use

  • Auditing skill quality after bulk creation or migration
  • Pre-release validation of the skills library
  • CI/CD quality gate for skill changes
  • Identifying broken cross-references after renaming skills
  • Checking compliance with v2 SKILL.md template format

Core Concepts

Evaluation Dimensions

The evaluator runs 18 checks organized into three severity levels:

Critical (blocks skill usage):

  • YAML frontmatter exists and parses
  • Required fields present: name, description

Warning (degrades quality):

  • Version follows semver, category matches directory
  • Required content sections present (Quick Start, When to Use, etc.)
  • Code blocks in key sections, no TODO/FIXME markers
  • Cross-references in related_skills resolve to real skills

Info (improvement opportunities):

  • Uses v2 template format, has optional sections (Metrics, etc.)
  • No duplicate skill names across the library

Report Output

Reports include:

  • Summary with pass/fail counts and percentages
  • Issues grouped by severity with counts
  • Per-category breakdown
  • Top 10 most common issues
  • Per-skill details with actionable fix suggestions

Usage Examples

Full Evaluation

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py

Output:

================================================================
  SKILL EVALUATION REPORT
  Generated: 2026-01-29T14:30:00+00:00
================================================================

SUMMARY
----------------------------------------
  Total skills evaluated:  230
  Passed (no critical):    187  (81.3%)
  Warnings only:           102  (44.3%)
  Critical failures:        43  (18.7%)

JSON for CI/CD

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py \
  --format json --severity critical \
  --output reports/skill-eval.json

Filter by Category

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --category development

Summary Only

uv run .claude/skills/development/skill-eval/scripts/eval-skills.py --summary-only

Best Practices

  • Run after creating new skills with /skill-creator to validate structure
  • Use --format json in CI pipelines for machine-readable output
  • Address critical issues first (missing frontmatter, invalid YAML)
  • Use --severity warning to focus on actionable improvements
  • Run --category filters for focused audits of specific skill areas

Error Handling

Exit CodeMeaning
0All skills pass (no critical issues)
1Critical failures found
2Script error (missing directory, invalid arguments)
Common IssueCauseFix
frontmatter_missingSkill uses legacy heading formatAdd --- delimited YAML frontmatter
yaml_invalidSyntax error in frontmatterFix YAML syntax (check colons, indentation)
related_skill_unresolvedReferenced skill name doesn't existCorrect the name or remove the reference
section_missingMissing required H2 sectionAdd the section heading and content

Metrics & Success Criteria

MetricTarget
Skills passing all critical checks100%
Skills with complete v2 sections>80%
Resolved cross-references100%
No TODO/FIXME in skills100%

Version History

  • 1.0.0 (2026-01-29): Initial release with 18 checks, human + JSON output, category/skill filtering

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

94/100Analyzed 2/19/2026

High-quality skill evaluation tool with 18 checks across severity levels. Comprehensive documentation with Quick Start, When to Use, core concepts, usage examples, error handling, and metrics. Provides both human-readable and JSON output for CI/CD integration. Tags (ci-cd, llm, observability) improve discoverability. Structured as a technical reference with actionable commands and fix suggestions. The skill is reusable across different skill libraries, not tied to a single project's content.

95
95
90
95
95

Metadata

Licenseunknown
Version1.0.0
Updated2/15/2026
Publishermajiayu000

Tags

ci-cdllmobservability