askill
ingest-code

ingest-codeSafety 90Repository

Ingest codebases for CWE scanning and knowledge extraction. Scans source files, extracts taxonomy tags, and stores CWE mappings in /memory. Designed to run as a /scheduler job for living document updates.

1 stars
1.2k downloads
Updated 3/6/2026

Package Files

Loading files...
SKILL.md

ingest-code

Ingest codebases for CWE scanning and knowledge extraction. Extracts taxonomy tags (including CWE mappings) from source files and stores them in /memory for cross-collection multi-hop traversal.

Purpose

This skill bridges codebases with the Federated Taxonomy system:

  1. Scan source files for security-relevant patterns
  2. Extract CWEs via /taxonomy (Fragility bridge)
  3. Store in /memory for recall and multi-hop traversal
  4. Run as /scheduler job for living document updates

Quick Start

cd .agent/skills/ingest-code

# Scan a codebase
./run.sh scan /path/to/codebase

# Scan with LLM validation (reduces false positives)
./run.sh scan /path/to/codebase --validate

# Dry run (no writes to memory)
./run.sh scan /path/to/codebase --dry-run

# Scan specific file types
./run.sh scan /path/to/codebase --glob "*.py"

Commands

scan - Scan Codebase for CWEs

./run.sh scan <path> [OPTIONS]

Options:
  --glob, -g         File pattern to scan (default: "*.py *.c *.cpp *.h *.rs *.go *.java *.ts *.js")
  --validate         Run LLM validation on CWE matches
  --dry-run          Show what would be stored without writing
  --scope            Memory scope (default: "code")
  --batch-size       Files per batch (default: 50)

rescan - Nightly Rescan (Scheduler Job)

./run.sh rescan [OPTIONS]

Options:
  --since            Only files modified since (ISO date or "1d", "7d", etc.)
  --validate         Run LLM validation
  --scope            Memory scope

Output Format

{
  "files_scanned": 150,
  "files_with_cwes": 23,
  "total_cwe_mappings": 45,
  "cwe_summary": {
    "CWE-120": 5,
    "CWE-787": 3,
    "CWE-20": 12
  },
  "bridge_tags": ["Fragility", "Resilience"],
  "stored_to_memory": 45
}

Integration with /taxonomy

The skill uses /taxonomy with collection="sparta" to extract CWEs:

from taxonomy import extract_taxonomy

# For each source file
result = extract_taxonomy(
    source_code,
    collection="sparta",
    include_cwes=True,
    validate_cwes=True  # Second-stage LLM filter
)

# result.cwe_mappings contains:
# [{"cwe_id": "CWE-120", "name": "Buffer Copy...", "category": "MemorySafety", "relevance": 0.8}]

Scheduler Integration

Register for nightly scans:

.agents/skills/scheduler/run.sh register \
  --name "cwe-code-rescan-nightly" \
  --cron "0 4 * * *" \
  --command ".agent/skills/ingest-code/run.sh rescan --validate" \
  --description "Nightly codebase CWE rescan"

CWE Categories Detected

Via the Fragility bridge in /taxonomy:

CategoryExample CWEsTriggers
MemorySafetyCWE-120, CWE-787, CWE-416buffer, overflow, memory, pointer
InputValidationCWE-20, CWE-89, CWE-78input, validation, inject, command
AuthenticationCWE-287, CWE-798, CWE-522auth, credential, password, session
AuthorizationCWE-269, CWE-862, CWE-863privilege, permission, access control
CryptographyCWE-311, CWE-327, CWE-330encrypt, crypto, key, random
SpaceSystemsCWE-1281, CWE-345, CWE-353spacecraft, firmware, telemetry

Environment

VariablePurpose
TAXONOMY_LLM_ENDPOINTCustom LLM for taxonomy extraction
MEMORY_SCOPEDefault memory scope for storage

Related Skills

SkillRelationship
/taxonomyProvides CWE extraction
/memoryStores CWE mappings for retrieval
/extractorCWE scanning for documents (PDFs, etc.)
/schedulerNightly rescan jobs
/treesitterCode parsing for advanced analysis

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

86/100Analyzed 2/23/2026

Well-structured skill for CWE codebase ingestion with clear purpose, comprehensive command documentation, and proper integration points. Located in dedicated skills folder with good metadata. Uses structured triggers, has multiple operational modes (scan, rescan, dry-run), and includes scheduler integration. Some internal system references but generally applicable pattern.

90
88
82
88
85

Metadata

Licenseunknown
Version-
Updated3/6/2026
Publishergrahama1970

Tags

ci-cdllmsecurity