askill
sparta-review

sparta-reviewSafety 95Repository

Comprehensive SPARTA dataset assessment driven by Brandon Bailey persona. Unified skill merging review-sparta (final assessment) and reality-check-sparta (iterative self-correction). Brandon can /dogpile for grey-area research and /ask colleagues for cross-persona consultation.

1 stars
1.2k downloads
Updated 3/6/2026

Package Files

Loading files...
SKILL.md

sparta-review

Unified SPARTA quality assessment driven by the Brandon Bailey persona. Merges final production assessment with iterative self-correction, and gives Brandon access to /dogpile for research and /ask for cross-persona consultation.

Brandon Bailey Persona

"I created SPARTA to give the space community a common language for discussing threats. When I find a gap, I don't just flag it — I research it through /dogpile and ask my colleagues what they think. A real review is collaborative, not a checklist."

Brandon's Colleagues (via /ask consult)

ColleagueExpertiseWhen Brandon Asks
SOC AnalystReal-world threat detection"Is this attack vector realistic?"
EmbryFormal verification"Can we prove this control works?"
NIST AuditorCompliance frameworks"Does this NIST mapping hold up?"
Red Team LeadOffensive security"How would you exploit this gap?"

Brandon's Research (via /dogpile)

When a QRA falls in a grey area (Tier 3), Brandon triggers /dogpile to:

  • Search Brave for real-world space security incidents
  • Query Perplexity for synthesized analysis
  • Pull ArXiv papers on space system vulnerabilities
  • Check MITRE ATT&CK for technique validation
  • Return structured citations (URL, title, excerpt, relevance)

Commands

Assessment (from review-sparta)

# Full Brandon assessment (all dimensions)
./run.sh assess --run-id run-recovery-verify --full

# Focus on specific dimensions
./run.sh assess --run-id run-recovery-verify --focus qra_quality,cwe_relevance

# Quick samples-only check
./run.sh assess --run-id run-recovery-verify --samples 50

# Compare two runs
./run.sh compare run-v1 run-v2 --dimension qra_quality

# Export report
./run.sh assess --run-id run-recovery-verify --full --report assessment.md

Self-Correction (from reality-check-sparta)

# Adversarial check with fix suggestions
./run.sh check --run-id run-recovery-verify --samples 20

# Self-correction loop (check → fix → recheck)
./run.sh iterate --run-id run-recovery-verify

# Automated iteration until clean
./run.sh loop --run-id run-recovery-verify

# Auto-fix: identify and delete bad QRAs
./run.sh auto-fix --run-id run-recovery-verify

Monitoring

# Watch batch and trigger checks at QRA checkpoints
./run.sh watch --run-id run-recovery-verify --checkpoint 10000

# Track convergence over time
./run.sh convergence

# Quick status
./run.sh status --run-id run-recovery-verify

Research & Consultation

# Brandon researches a grey area via /dogpile
./run.sh research "RF jamming countermeasures for LEO constellations"

# Brandon consults a colleague
./run.sh consult "SOC Analyst" "Is this GPS spoofing technique realistic for commercial satellites?"

# Brandon consults multiple colleagues
./run.sh consult "Red Team Lead" "How would you exploit this ground segment weakness?" \
  --also-ask "SOC Analyst,NIST Auditor"

Reports

# Client-facing assessment report
./run.sh report --run-id run-recovery-verify

# Past findings from /memory
./run.sh history

Review Dimensions

DimensionWeightWhat Brandon Checks
QRA Quality25%Verbatim grounding, citation accuracy, no hallucination
Source Fidelity20%Does DB exactly match SPARTA-Data.xlsx?
CWE Relevance20%Are CWEs applicable to space/embedded systems?
Cross-Reference15%MITRE ATT&CK, NIST 800-53, D3FEND accuracy
Coverage10%All 216 techniques, 91 countermeasures represented?
Control Quality10%Control-to-control comparisons meaningful?

Dynamic Thresholds (Annealing Schedule)

Brandon adjusts standards based on corpus size:

PhaseQRA RangeAnchoring FailGeneric FailGrounding MinBrandon Says
Bootstrap0-5K50%80%0.50"Let's see what we're working with"
Early Growth5K-15K40%70%0.55"Time to raise the bar"
Mid Growth15K-40K35%65%0.60"No more excuses"
Late Growth40K-80K30%60%0.65"Tightening the screws"
Refinement80K-100K25%55%0.70"Time to be strict"
Gold Standard100K+20%50%0.75"No compromises"

Three-Tier Intelligence

TierSourceHow Brandon Uses It
Tier 1 (Vetted)SPARTA QRAs, grounding >= 0.75Direct citation in assessment
Tier 2 (Adapted)ATT&CK/CWE adapted to space, >= 0.70Inference with caveat
Tier 3 (Researched)/dogpile at query timeStructured citations from web/papers

QRA Convergence Model

QRA quality improvement follows the same dynamics as model training convergence. The pipeline converges the datalake (4,017 controls, 77,528 relationships, 46K knowledge excerpts) into high-quality QRAs through iterative refinement.

ML TrainingSPARTA QRA Pipeline
Training dataSPARTA controls, relationships, knowledge excerpts
Model weightsQRA corpus (generated answers)
Loss functionBrandon's issue count (anchoring failures, grounding gaps)
Learning ratePrompt aggressiveness (how much we demand per QRA)
Gradient descentgenerate → assess → fix prompts → regenerate
EpochOne convergence cycle (10K QRA checkpoint)
OverfittingGaming thresholds / lowering standards (NEVER DO THIS)
PlateauPrompt ceiling — need /prompt-lab to redesign prompts
Validation setBrandon's adversarial spot checks (not the same data)
Early stoppingQuality converged — stop changing prompts

Convergence Rules

  1. Issue count MUST decrease cycle over cycle (like loss decreasing)
  2. 3 consecutive regressions = stalled → human intervention (like divergent training)
  3. Plateau = prompt ceiling → use /prompt-lab to redesign (like changing architecture)
  4. NEVER lower thresholds to game the curve (like data leakage)
  5. Track metrics over time via convergence_state.json (like TensorBoard)

Full Convergence Loop

# Autonomous (runs for days):
./run.sh converge --run-id run-recovery-verify --checkpoint 10000 --target 90000

# The converge command orchestrates:
# 1. Monitor QRA count → wait for checkpoint
# 2. Snapshot DB → Brandon assessment
# 3. If PASS → continue to next checkpoint
# 4. If FAIL → auto-fix (delete bad QRAs) → recalibrate prompts → restart
# 5. Track convergence (issue count should decrease per cycle)
# 6. Stop when: target reached, quality converged, or stuck

Manual Quality Gate (every 10K QRAs)

1. ./run.sh assess --full --store           # Brandon assessment
2. ./run.sh auto-fix                        # Delete bad QRAs
3. ./run.sh recalibrate                     # Optimize prompts via /prompt-lab
4. ./run.sh convergence                     # Verify quality improving
5. [restart generation with improved prompts]

Grading Scale

GradeCriteria
A+ EXCELLENT<20% generic, 100% source fidelity, >0.9 grounding
A GOOD<30% generic, 95%+ source fidelity, >0.85 grounding
B ACCEPTABLE<50% generic, 90%+ source fidelity, >0.80 grounding
C NEEDS WORK<70% generic, 80%+ source fidelity, >0.70 grounding
F FAIL>70% generic OR major fidelity issues

Integration

SkillHow Brandon Uses It
/dogpileGrey-area research with structured citations
/ask consultCross-persona consultation (SOC Analyst, Red Team, etc.)
/memoryStore/recall findings, learn from past assessments
/taxonomyBridge attribute extraction for QRA classification
/extractorVerify source URLs extractable
/fetcherFresh URL content verification
/task-monitorReport assessment status

Environment Variables

VariablePurposeDefault
SPARTA_SOURCE_PATHPath to SPARTA-Data.xlsxdata/source/SPARTA-Data.xlsx
CHUTES_API_KEYFor /scillm LLM calls(required for research)
BRANDON_STRICT_MODEFail on any warningfalse

Relationship to Deprecated Skills

This skill replaces both:

  • review-sparta — assessment logic merged into assess command
  • reality-check-sparta — iteration logic merged into check/iterate/loop/auto-fix

Both old skills should be considered deprecated in favor of this unified skill.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

88/100Analyzed 2/23/2026

Highly polished, comprehensive skill for SPARTA dataset quality assessment. Excellent structure with persona-driven approach, detailed command documentation, ML-inspired convergence model, and clear grading criteria. Strong on actionability, clarity, and completeness. Minor扣分 for domain specificity limiting reusability beyond SPARTA context. Bonus points for triggers section, structured commands, metadata, and skills folder location.

95
92
55
98
90

Metadata

Licenseunknown
Version-
Updated3/6/2026
Publishergrahama1970

Tags

ci-cdllmobservabilitypromptingsecurity