Token Management & Compression Strategies
Comprehensive guide for managing token budget, applying tier-appropriate compression, and optimizing context window usage.
Overview
Context window is a public good - every token competes with other information.
This skill provides:
- Token budget monitoring strategies
- Tier-aware compression ratios (T1: 5-10x, T2: 2-5x, T3+: minimal)
- Observation masking rules
- Semantic preservation guidelines (LLMLingua-2)
- Budget overflow protocols
Token Budget Awareness (T2+)
Built-in Monitoring
# Check current project stats
opencode stats --project ""
# Check last 7 days with model breakdown
opencode stats --days 7 --models 5
# Full breakdown
opencode stats --days 30 --models 10 --tools 10
Budget Action Triggers
Heuristic Triggers
| Event | Action |
|---|---|
| Phase boundary | Run opencode stats, summarize to task_plan.md |
| 3+ long tool outputs | Consider notes.md offload |
| Error investigation >2 attempts | Document state, check stats |
| Research accumulated | Transfer to notes.md |
Before /clear | Run stats to log, then clear |
Budget Overflow Protocol
When context fills up (75%+ usage):
-
Run assessment:
opencode stats --project "" -
Offload research findings:
- Transfer to
notes.md - Keep executive summary in context
- Transfer to
-
Summarize completed phases:
- Update
task_plan.mdwith phase summaries - Archive detailed exploration notes
- Update
-
Store key learnings (Load skill
sia-code/decision-tracefor structured format):uvx sia-code memory add-decision "[Category]: [Decision]. Context: [trigger]. Reasoning: [why]. Outcome: [result]." -
If still overloaded:
/clearand restore fromtask_plan.md- Re-establish context from plan + notes
Observation Masking (Tier-Aware)
Long outputs waste tokens. Apply tier-appropriate masking (50%+ cost savings):
Masking Rules by Tier
| Output Type | T1 (Simple) | T2 (Moderate) | T3+ (Complex) |
|---|---|---|---|
| File >100 lines | First 20 + last 10 + matches | Error context + 20 lines | Full structure |
| Command success | Exit code only | Exit code + key metrics | Exit code + full output |
| Command error | Full error + 3 lines | Full error + 5 lines | Full error section |
| Test results | Pass/fail counts | + first 3 failures | + all failures + stacks |
| API response | Schema only | Schema + sample | Full response |
| Build logs | Final 5 lines | Final 10 lines | Full error section |
Semantic Preservation
Always keep:
- Function signatures
- Error lines
- Imports
- Class definitions
Safe to compress:
- Repeated patterns
- Verbose comments
- Whitespace
Never discard:
- The exact line referenced in errors
Compression Strategy (Tier-Aware)
Semantic Preservation Rules (LLMLingua-2)
Core Principles
Always keep:
- Function signatures
- Imports
- Class definitions
- Error lines
- Variable declarations (in scope)
Compress safely:
- Repeated patterns
- Verbose comments
- Extensive whitespace
- Boilerplate code
Never discard:
- The exact line referenced in errors
- Function/class definitions in error stack
- Import statements causing issues
Example: T1 Compression
Original (100 lines):
# Long file with verbose comments
import os
import sys
import json
def process_data(data):
"""
This function processes data by doing X, Y, and Z.
It takes a data parameter and returns processed result.
... (50 lines of docstring) ...
"""
# Implementation details...
result = transform(data)
return result
# ... 80 more lines ...
T1 Compressed:
import os, sys, json
def process_data(data):
result = transform(data)
return result
# ... [80 lines compressed] ...
Example: T3 Architecture Task
Original: Keep FULL
Reasoning: Architecture decisions require understanding full context, including:
- All class relationships
- Method signatures
- Inheritance hierarchies
- Complex logic flow
Context Stability
Keep Fixed
- AGENTS.md rules (this system prompt)
- Current task goal from task_plan.md
- Active phase objectives
Summarize at Boundaries
- Completed phases (executive summary in task_plan.md)
- Exploration findings (detailed in notes.md)
- Research notes (transfer to notes.md or sia-code memory)
Archive Aggressively
- Old tool outputs (>3 turns ago, unless actively referenced)
- Completed explorations
- Resolved error investigations
Recovery Strategies
Pre-Clear Protocol
Before running /clear:
-
Document current state:
- Update task_plan.md with exact position
- Log next 2 steps clearly
- Store context-critical insights in sia-code memory
-
Run stats:
opencode stats --project "" -
Archive findings:
- Transfer research to notes.md
- Store learnings in sia-code memory
-
Mark checkpoint in plan:
## Checkpoint: Before Clear - Position: [exact step] - Next: [next 2 steps] - Critical context: [key info]
Post-Clear Recovery
After /clear:
-
Read task_plan.md:
- Find "Checkpoint: Before Clear" or current position
- Understand completed phases
-
Restore TodoWrite:
- Initialize with remaining steps
- Mark prior phases as completed
-
Resume from position:
- Continue from exact step
- Reference notes.md as needed
Best Practices
DO
✅ Monitor at phase boundaries (opencode stats)
✅ Offload research to notes.md early (not at 90%)
✅ Store learnings in sia-code memory (not in context)
✅ Match compression to tier (T1: heavy, T3: light)
✅ Keep errors at full fidelity (always)
DON'T
❌ Wait until forced to /clear (proactive offloading) ❌ Apply same compression to all tiers (tier-aware) ❌ Compress error outputs (always full) ❌ Lose investigation progress (store first, then /clear) ❌ Forget to log stats before /clear (tracking)
Quick Reference
When to Check Stats
- ☐ Phase boundaries
- ☐ After 3+ long tool outputs
- ☐ Before /clear
- ☐ Error investigation >2 attempts
Compression Ratios
- T1: 5-10x (aggressive)
- T2: 2-5x (moderate)
- T3: 1-2x (light)
- T4: None (full fidelity)
- Errors: FULL (always)
Offload Targets
- Research findings →
notes.md - Key learnings → sia-code memory
- Completed phases → task_plan.md summary
- Old tool outputs → Archive (remove from context)
Usage
Load this skill when:
- Context feels bloated (offload guidance)
- Approaching token limits (overflow protocol)
- Uncertain about compression level (tier matching)
- Before /clear (recovery protocols)
- Setting up new task (budget awareness)
