askill
confidence-levels

confidence-levelsSafety 100Repository

Force honest confidence assessment. Express confidence as percentage, explain gaps, validate assumptions before presenting conclusions.

1 stars
1.2k downloads
Updated 1/15/2026

Package Files

Loading files...
SKILL.md

Confidence Levels

Express confidence as a percentage, not vague certainty.

Core Principle

A thorough analysis that looks certain but isn't can mislead users into wrong decisions. Conflating explanation quality with evidence quality causes harm.

Critical Rules

RuleEnforcement
Express confidence as %Not "probably" - use "70% confident"
Explain gaps below 95%Mandatory "Why not 100%?"
Validate before presentingIf you can gather evidence, do it
Show your mathEvidence adds confidence, gaps subtract

Confidence Scale

RangeIconMeaning
0-30%πŸ”΄Speculation - needs significant validation
31-60%🟑Plausible - evidence exists but gaps remain
61-85%🟠Likely - strong evidence, minor gaps
86-94%🟒High confidence - validated, minor uncertainty
95-100%πŸ’―Confirmed - fully validated

Calibration Guide

LevelMeaning
20%One possibility among several
40%Evidence points this direction, key assumptions unverified
60%Evidence supports this, alternatives not ruled out
80%Strong evidence, assumptions verified, alternatives less likely
95%Validated with direct evidence, alternatives ruled out
100%Mathematical/logical certainty only

Pre-Conclusion Checkpoint

Before claiming ANY conclusion, complete this:

1. Evidence Inventory

  • What hard evidence supports this?
  • Direct evidence (code/logs that prove it)?
  • What's the strongest piece of evidence?

2. Falsifiability Check

  • What would INVALIDATE this theory?
  • Have I looked for that data?
  • If no: WHY NOT?

3. Assumption Audit

  • What am I assuming WITHOUT verification?
  • Mark each: [VERIFIED] or [ASSUMED]

4. Alternative Possibilities

  • What else could explain these symptoms?
  • Why is my conclusion more likely?

Confidence Scoring

Start at 50% (neutral) and adjust:

FactorAdjustment
Direct evidence (code/logs proving it)+15-25%
Verified assumptions (checked, not assumed)+10-15%
Alternatives ruled out with evidence+10-15%
Falsifiability check completed+5-10%
Unverified critical assumptions-15-25%
Plausible alternatives not ruled out-10-15%
No falsifiability check-10%
Circumstantial evidence only-10%

Response Format

[Icon] [X%] Confidence: [One sentence finding]

Evidence:
- [+X%] [Evidence 1]
- [+X%] [Evidence 2]

Why not 100%:
- [-X%] [Gap 1]

To increase confidence: [What's needed]

Examples

WRONG - Premature Certainty

Claude: "I have complete clarity on the root cause. The issue
is definitely the database connection pool being exhausted.
Here's my detailed analysis..."

[200 lines of report]

User: [provides data that contradicts analysis]

Claude: "This changes everything..."

Problem: Claimed certainty with unverified assumptions.

CORRECT - Honest Assessment

🟑 40% Confidence: Database connection pool may be exhausted

Evidence:
- [+15%] Error pattern matches connection timeout
- [+10%] Load spike occurred around failure time

Why not 100%:
- [-25%] Haven't verified actual pool metrics
- [-15%] Haven't ruled out: network issues, DNS problems
- [-10%] This is mostly pattern matching, not direct evidence

To increase confidence: I need the connection pool metrics
from the time of failure. Can you provide those?

WRONG - Confidence in Explanation Quality

Claude: "Based on my thorough analysis of the code flow,
I'm certain the bug is in the authentication middleware."

Problem: Thoroughness of presentation β‰  correctness. Detailed reasoning without verified evidence is speculation.

CORRECT - Evidence-Based Confidence

🟠 75% Confidence: Bug is in authentication middleware

Evidence:
- [+20%] Error message explicitly mentions "token expired"
- [+15%] Code trace shows validation at auth/middleware.ts:45
- [+10%] Token timestamp is from 2 hours ago

Why not 100%:
- [-10%] Haven't verified the clock sync between servers
- [-15%] Could be a different middleware in the chain

To increase confidence: Let me check the server time sync
and trace the full middleware chain.

Self-Validation Rule

Before returning to user with questions you can answer yourself:

Can I gather more evidence myself?
β”œβ”€ Search codebase for confirming/denying data?
β”œβ”€ Fetch a file that validates an assumption?
β”œβ”€ Check actual state vs assumed state?
└─ Run a test to verify?

If YES β†’ DO IT. Then reassess confidence.
If NO β†’ Present with honest confidence + what you need.

Critical: If confidence is below 80% and you CAN gather more evidence β†’ DO IT.

Trigger Words

Auto-invoke this skill when about to claim:

  • "root cause is", "the problem is"
  • "complete clarity", "definitely", "certainly"
  • "clearly the issue", "obviously"
  • Any conclusive claim during investigation

Integration

SkillRelationship
critical-peerChallenge conclusions lacking evidence
research-firstGather evidence before concluding
debugging-methodologyEvidence-based investigation

Anti-Patterns

Anti-PatternViolation
"Complete clarity"Claimed certainty without validation
"Definitely the issue"Unqualified conclusion
Building detailed reportsThoroughness β‰  correctness
"It's probably X"Missing confidence % and gaps
Skipping falsifiabilityHaven't asked "what would prove me wrong?"

Quick Reference

  • Did I express confidence as a percentage?
  • Did I explain what's stopping 100%?
  • Did I show evidence for the % claimed?
  • Could I gather more evidence myself?
  • Did I check for falsifying evidence?

Install

Download ZIP
Requires askill CLI v1.0+β–Ά

AI Quality Score

88/100Analyzed 4/5/2026

High-quality meta-cognitive skill that provides structured frameworks for honest confidence assessment. Excellent actionability with clear scoring adjustments, response formats, and self-validation rules. The skill is reusable across any analysis context, with strong examples showing WRONG vs CORRECT patterns. Minor tag mismatch with content but otherwise well-structured for general use.

100
90
95
85
92

Metadata

Licenseunknown
Version1.0.0
Updated1/15/2026
Publisherjagreehal

Tags

databasellmobservabilitysecuritytesting