Hype Assessment Skill
Assess which AI topics are overhyped, underhyped, or accurately assessed based on synthesized claims.
Assessment Framework
Overhyped Topics (Lab enthusiasm exceeds warranted confidence)
Signs of overhype:
- Lab researchers make strong claims that critics have substantively challenged
- Evidence quality is low but confidence is high
- Past predictions in this area have repeatedly failed
- Marketing language exceeds technical substance
- Hype delta > +0.3
Underhyped Topics (Critic skepticism may be excessive)
Signs of underhype:
- Real progress has been made but critics haven't updated
- Evidence is strong but narrative hasn't caught up
- Lab hints suggest unreleased capabilities
- Quiet progress without announcements
- Hype delta < -0.3
Accurately Assessed Topics
Signs of accurate assessment:
- Lab and critic views are relatively aligned
- Claims match observable evidence
- Predictions have been reasonably accurate
- Hype delta between -0.2 and +0.2
Scoring System
For each topic, assign a score from -1.0 to +1.0:
| Score | Meaning |
|---|---|
| +1.0 | Severely overhyped - massive gap between claims and reality |
| +0.5 | Moderately overhyped - lab enthusiasm outpaces evidence |
| +0.2 | Slightly overhyped |
| 0.0 | Accurately assessed |
| -0.2 | Slightly underhyped |
| -0.5 | Moderately underhyped - real progress being underrated |
| -1.0 | Severely underhyped - major developments being ignored |
Evidence to Consider
For Overhyped Assessment
- Repeated failed predictions
- Marketing claims exceeding published results
- Hype cycle patterns (lots of announcements, few deliverables)
- Benchmark gaming without real-world transfer
- "Just around the corner" claims that keep slipping
For Underhyped Assessment
- Steady progress without fanfare
- Working deployments with limited publicity
- Academic results that haven't reached mainstream
- Capabilities that exist but aren't marketed
- Legitimate breakthroughs dismissed by critics
For Accurate Assessment
- Claims that held up over time
- Convergence between lab and critic views
- Predictions that came true
- Honest acknowledgment of limitations
- Nuanced discussion of tradeoffs
Output Format
Return JSON:
{
"overhypedTopics": [
{
"topic": "agents",
"score": 0.6,
"reasoning": "Lab enthusiasm for autonomous agents significantly exceeds demonstrated reliability. Multiple high-profile failures in production while claims of imminent AGI-like autonomy persist.",
"keyEvidence": [
"Devin and similar demos failed to replicate",
"Production agent deployments have high failure rates",
"Claims of 'replacing developers' haven't materialized"
]
}
],
"underhypedTopics": [
{
"topic": "interpretability",
"score": -0.5,
"reasoning": "Significant progress on mechanistic interpretability is being made at Anthropic and elsewhere, but mainstream coverage focuses on capabilities. Real tools for understanding models are emerging.",
"keyEvidence": [
"Golden Gate Claude demonstrated genuine steering",
"Feature extraction becoming reproducible",
"SAEs showing practical utility"
]
}
],
"accuratelyAssessedTopics": [
{
"topic": "multimodal",
"score": 0.1,
"reasoning": "Vision-language models have improved substantially and assessments largely reflect actual capabilities. Both enthusiasm and concerns are grounded.",
"keyEvidence": [
"GPT-4V and Claude vision work as advertised",
"Known limitations acknowledged",
"Incremental improvements match expectations"
]
}
],
"overallFieldSentiment": 0.72,
"summary": "A paragraph summarizing the overall hype landscape..."
}
Overall Field Sentiment
Calculate as weighted average of lab researcher bullishness across all topics (0.0-1.0).
Interpretation:
- 0.8-1.0: Extremely bullish field sentiment (potential bubble)
- 0.6-0.8: Optimistic but measured
- 0.4-0.6: Balanced/uncertain
- 0.2-0.4: Cautious/skeptical
- 0.0-0.2: Pessimistic
Summary Guidelines
Write a single paragraph summarizing:
- Overall hype temperature
- Most overhyped area and why
- Most underhyped area and why
- What sophisticated observers should pay attention to
Tone: Direct, opinionated but fair, grounded in evidence.
