Token Management & Compression Strategies

Comprehensive guide for managing token budget, applying tier-appropriate compression, and optimizing context window usage.

Overview

Context window is a public good - every token competes with other information.

This skill provides:

Token budget monitoring strategies
Tier-aware compression ratios (T1: 5-10x, T2: 2-5x, T3+: minimal)
Observation masking rules
Semantic preservation guidelines (LLMLingua-2)
Budget overflow protocols

Token Budget Awareness (T2+)

Built-in Monitoring

# Check current project stats
opencode stats --project ""

# Check last 7 days with model breakdown
opencode stats --days 7 --models 5

# Full breakdown
opencode stats --days 30 --models 10 --tools 10

Budget Action Triggers

Heuristic Triggers

Event	Action
Phase boundary	Run `opencode stats`, summarize to task_plan.md
3+ long tool outputs	Consider notes.md offload
Error investigation >2 attempts	Document state, check stats
Research accumulated	Transfer to notes.md
Before `/clear`	Run stats to log, then clear

Budget Overflow Protocol

When context fills up (75%+ usage):

Run assessment:
```
opencode stats --project ""
```
Offload research findings:
- Transfer to notes.md
- Keep executive summary in context
Summarize completed phases:
- Update task_plan.md with phase summaries
- Archive detailed exploration notes

Store key learnings (Load skill sia-code/decision-trace for structured format):

uvx sia-code memory add-decision "[Category]: [Decision]. Context: [trigger]. Reasoning: [why]. Outcome: [result]."

If still overloaded:
- /clear and restore from task_plan.md
- Re-establish context from plan + notes

Observation Masking (Tier-Aware)

Long outputs waste tokens. Apply tier-appropriate masking (50%+ cost savings):

Masking Rules by Tier

Output Type	T1 (Simple)	T2 (Moderate)	T3+ (Complex)
File >100 lines	First 20 + last 10 + matches	Error context + 20 lines	Full structure
Command success	Exit code only	Exit code + key metrics	Exit code + full output
Command error	Full error + 3 lines	Full error + 5 lines	Full error section
Test results	Pass/fail counts	+ first 3 failures	+ all failures + stacks
API response	Schema only	Schema + sample	Full response
Build logs	Final 5 lines	Final 10 lines	Full error section

Semantic Preservation

Always keep:

Function signatures
Error lines
Imports
Class definitions

Safe to compress:

Repeated patterns
Verbose comments
Whitespace

Never discard:

The exact line referenced in errors

Compression Strategy (Tier-Aware)

Semantic Preservation Rules (LLMLingua-2)

Core Principles

Always keep:

Function signatures
Imports
Class definitions
Error lines
Variable declarations (in scope)

Compress safely:

Repeated patterns
Verbose comments
Extensive whitespace
Boilerplate code

Never discard:

The exact line referenced in errors
Function/class definitions in error stack
Import statements causing issues

Example: T1 Compression

Original (100 lines):

# Long file with verbose comments
import os
import sys
import json

def process_data(data):
    """
    This function processes data by doing X, Y, and Z.
    It takes a data parameter and returns processed result.
    ... (50 lines of docstring) ...
    """
    # Implementation details...
    result = transform(data)
    return result

# ... 80 more lines ...

T1 Compressed:

import os, sys, json
def process_data(data):
    result = transform(data)
    return result
# ... [80 lines compressed] ...

Example: T3 Architecture Task

Original: Keep FULL

Reasoning: Architecture decisions require understanding full context, including:

All class relationships
Method signatures
Inheritance hierarchies
Complex logic flow

Context Stability

Keep Fixed

AGENTS.md rules (this system prompt)
Current task goal from task_plan.md
Active phase objectives

Summarize at Boundaries

Completed phases (executive summary in task_plan.md)
Exploration findings (detailed in notes.md)
Research notes (transfer to notes.md or sia-code memory)

Archive Aggressively

Old tool outputs (>3 turns ago, unless actively referenced)
Completed explorations
Resolved error investigations

Recovery Strategies

Pre-Clear Protocol

Before running /clear:

Document current state:
- Update task_plan.md with exact position
- Log next 2 steps clearly
- Store context-critical insights in sia-code memory
Run stats:
```
opencode stats --project ""
```
Archive findings:
- Transfer research to notes.md
- Store learnings in sia-code memory

Mark checkpoint in plan:

## Checkpoint: Before Clear
- Position: [exact step]
- Next: [next 2 steps]
- Critical context: [key info]

Post-Clear Recovery

After /clear:

Read task_plan.md:
- Find "Checkpoint: Before Clear" or current position
- Understand completed phases
Restore TodoWrite:
- Initialize with remaining steps
- Mark prior phases as completed
Resume from position:
- Continue from exact step
- Reference notes.md as needed

Best Practices

DO

✅ Monitor at phase boundaries (opencode stats) ✅ Offload research to notes.md early (not at 90%) ✅ Store learnings in sia-code memory (not in context) ✅ Match compression to tier (T1: heavy, T3: light) ✅ Keep errors at full fidelity (always)

DON'T

❌ Wait until forced to /clear (proactive offloading) ❌ Apply same compression to all tiers (tier-aware) ❌ Compress error outputs (always full) ❌ Lose investigation progress (store first, then /clear) ❌ Forget to log stats before /clear (tracking)

Quick Reference

When to Check Stats

☐ Phase boundaries
☐ After 3+ long tool outputs
☐ Before /clear
☐ Error investigation >2 attempts

Compression Ratios

T1: 5-10x (aggressive)
T2: 2-5x (moderate)
T3: 1-2x (light)
T4: None (full fidelity)
Errors: FULL (always)

Offload Targets

Research findings → notes.md
Key learnings → sia-code memory
Completed phases → task_plan.md summary
Old tool outputs → Archive (remove from context)

Usage

Load this skill when:

Context feels bloated (offload guidance)
Approaching token limits (overflow protocol)
Uncertain about compression level (tier matching)
Before /clear (recovery protocols)
Setting up new task (budget awareness)

token-managementSafety 90Repository

Package Files

Token Management & Compression Strategies

Overview

Token Budget Awareness (T2+)

Built-in Monitoring

Budget Action Triggers

Heuristic Triggers

Budget Overflow Protocol

Observation Masking (Tier-Aware)

Masking Rules by Tier

Semantic Preservation

Compression Strategy (Tier-Aware)

Semantic Preservation Rules (LLMLingua-2)

Core Principles

Example: T1 Compression

Example: T3 Architecture Task

Context Stability

Keep Fixed

Summarize at Boundaries

Archive Aggressively

Recovery Strategies

Pre-Clear Protocol

Post-Clear Recovery

Best Practices

DO

DON'T

Quick Reference

When to Check Stats

Compression Ratios

Offload Targets

Usage

Install

AI Quality Score

Metadata

Tags

token-managementSafety 90Repository ShareFavorite skill

Package Files

Token Management & Compression Strategies

Overview

Token Budget Awareness (T2+)

Built-in Monitoring

Budget Action Triggers

Heuristic Triggers

Budget Overflow Protocol

Observation Masking (Tier-Aware)

Masking Rules by Tier

Semantic Preservation

Compression Strategy (Tier-Aware)

Semantic Preservation Rules (LLMLingua-2)

Core Principles

Example: T1 Compression

Example: T3 Architecture Task

Context Stability

Keep Fixed

Summarize at Boundaries

Archive Aggressively

Recovery Strategies

Pre-Clear Protocol

Post-Clear Recovery

Best Practices

DO

DON'T

Quick Reference

When to Check Stats

Compression Ratios

Offload Targets

Usage

Install

AI Quality Score

Metadata

Tags

token-managementSafety 90Repository