Sysadmin Ops Skill

Use when the user asks to investigate crashes, runaway memory, machine instability, or to design autonomous reliability guardrails for Pi workflows.

Primary goals

Stabilize the host quickly.
Preserve evidence for root-cause analysis.
Restore interrupted workflows with minimal context loss.
Prevent recurrence with layered guardrails.

Incident triage workflow

Confirm event window
- capture local timestamp range
- list active/affected workspaces
Collect host forensics (macOS)
- inspect /Library/Logs/DiagnosticReports/JetsamEvent-*.ips
- summarize top process families by aggregate RSS and process count
- extract evidence of process storms (count, age distribution, coalition hints)
Collect Pi execution forensics
- parse ~/.pi/agent/sessions/** for bash commands around event window
- identify high-risk commands (tests/builds/nested CLI/orchestration)
- detect unfinished commands and crash-correlated sessions
Containment recommendations
- immediate limits (session count, concurrency, timeouts)
- command-level guardrails for sharp edges
- optional emergency stop procedure
Recovery plan
- regenerate per-workspace handoff state
- enumerate next-resume checklist per workspace
Preventive hardening
- extension guardrails
- slice policy changes
- watchdog automation and alerts

Key sharp edges to check

nested non-interactive pi invocations inside agents
unbounded test/build commands without explicit timeout
test runners without worker caps
team/pipeline recursion loops
many simultaneous workspaces with heavy runners

Output contract

## Incident Summary

## Forensic Evidence
- host
- session timeline

## Likely Root Causes (ranked)

## Immediate Containment

## Recovery Plan

## Hardening Plan
- now
- next
- later

sysadmin-opsSafety 85Repository

Package Files

Sysadmin Ops Skill

Primary goals

Incident triage workflow

Key sharp edges to check

Output contract

Install

AI Quality Score

Metadata

Tags

sysadmin-opsSafety 85Repository ShareFavorite skill

Package Files

Sysadmin Ops Skill

Primary goals

Incident triage workflow

Key sharp edges to check

Output contract

Install

AI Quality Score

Metadata

Tags

sysadmin-opsSafety 85Repository