ingest-code
Ingest codebases for CWE scanning and knowledge extraction. Extracts taxonomy tags (including CWE mappings) from source files and stores them in /memory for cross-collection multi-hop traversal.
Purpose
This skill bridges codebases with the Federated Taxonomy system:
- Scan source files for security-relevant patterns
- Extract CWEs via
/taxonomy(Fragility bridge) - Store in
/memoryfor recall and multi-hop traversal - Run as /scheduler job for living document updates
Quick Start
cd .agent/skills/ingest-code
# Scan a codebase
./run.sh scan /path/to/codebase
# Scan with LLM validation (reduces false positives)
./run.sh scan /path/to/codebase --validate
# Dry run (no writes to memory)
./run.sh scan /path/to/codebase --dry-run
# Scan specific file types
./run.sh scan /path/to/codebase --glob "*.py"
Commands
scan - Scan Codebase for CWEs
./run.sh scan <path> [OPTIONS]
Options:
--glob, -g File pattern to scan (default: "*.py *.c *.cpp *.h *.rs *.go *.java *.ts *.js")
--validate Run LLM validation on CWE matches
--dry-run Show what would be stored without writing
--scope Memory scope (default: "code")
--batch-size Files per batch (default: 50)
rescan - Nightly Rescan (Scheduler Job)
./run.sh rescan [OPTIONS]
Options:
--since Only files modified since (ISO date or "1d", "7d", etc.)
--validate Run LLM validation
--scope Memory scope
Output Format
{
"files_scanned": 150,
"files_with_cwes": 23,
"total_cwe_mappings": 45,
"cwe_summary": {
"CWE-120": 5,
"CWE-787": 3,
"CWE-20": 12
},
"bridge_tags": ["Fragility", "Resilience"],
"stored_to_memory": 45
}
Integration with /taxonomy
The skill uses /taxonomy with collection="sparta" to extract CWEs:
from taxonomy import extract_taxonomy
# For each source file
result = extract_taxonomy(
source_code,
collection="sparta",
include_cwes=True,
validate_cwes=True # Second-stage LLM filter
)
# result.cwe_mappings contains:
# [{"cwe_id": "CWE-120", "name": "Buffer Copy...", "category": "MemorySafety", "relevance": 0.8}]
Scheduler Integration
Register for nightly scans:
.agents/skills/scheduler/run.sh register \
--name "cwe-code-rescan-nightly" \
--cron "0 4 * * *" \
--command ".agent/skills/ingest-code/run.sh rescan --validate" \
--description "Nightly codebase CWE rescan"
CWE Categories Detected
Via the Fragility bridge in /taxonomy:
| Category | Example CWEs | Triggers |
|---|---|---|
| MemorySafety | CWE-120, CWE-787, CWE-416 | buffer, overflow, memory, pointer |
| InputValidation | CWE-20, CWE-89, CWE-78 | input, validation, inject, command |
| Authentication | CWE-287, CWE-798, CWE-522 | auth, credential, password, session |
| Authorization | CWE-269, CWE-862, CWE-863 | privilege, permission, access control |
| Cryptography | CWE-311, CWE-327, CWE-330 | encrypt, crypto, key, random |
| SpaceSystems | CWE-1281, CWE-345, CWE-353 | spacecraft, firmware, telemetry |
Environment
| Variable | Purpose |
|---|---|
TAXONOMY_LLM_ENDPOINT | Custom LLM for taxonomy extraction |
MEMORY_SCOPE | Default memory scope for storage |
Related Skills
| Skill | Relationship |
|---|---|
/taxonomy | Provides CWE extraction |
/memory | Stores CWE mappings for retrieval |
/extractor | CWE scanning for documents (PDFs, etc.) |
/scheduler | Nightly rescan jobs |
/treesitter | Code parsing for advanced analysis |
