askill
sysadmin-ops

sysadmin-opsSafety 85Repository

Investigate host stability incidents (memory/process pressure), triage likely culprits, and drive safe recovery + hardening for multi-workspace Pi operations.

5 stars
1.2k downloads
Updated 2 weeks ago

Package Files

Loading files...
SKILL.md

Sysadmin Ops Skill

Use when the user asks to investigate crashes, runaway memory, machine instability, or to design autonomous reliability guardrails for Pi workflows.

Primary goals

  1. Stabilize the host quickly.
  2. Preserve evidence for root-cause analysis.
  3. Restore interrupted workflows with minimal context loss.
  4. Prevent recurrence with layered guardrails.

Incident triage workflow

  1. Confirm event window

    • capture local timestamp range
    • list active/affected workspaces
  2. Collect host forensics (macOS)

    • inspect /Library/Logs/DiagnosticReports/JetsamEvent-*.ips
    • summarize top process families by aggregate RSS and process count
    • extract evidence of process storms (count, age distribution, coalition hints)
  3. Collect Pi execution forensics

    • parse ~/.pi/agent/sessions/** for bash commands around event window
    • identify high-risk commands (tests/builds/nested CLI/orchestration)
    • detect unfinished commands and crash-correlated sessions
  4. Containment recommendations

    • immediate limits (session count, concurrency, timeouts)
    • command-level guardrails for sharp edges
    • optional emergency stop procedure
  5. Recovery plan

    • regenerate per-workspace handoff state
    • enumerate next-resume checklist per workspace
  6. Preventive hardening

    • extension guardrails
    • slice policy changes
    • watchdog automation and alerts

Key sharp edges to check

  • nested non-interactive pi invocations inside agents
  • unbounded test/build commands without explicit timeout
  • test runners without worker caps
  • team/pipeline recursion loops
  • many simultaneous workspaces with heavy runners

Output contract

## Incident Summary

## Forensic Evidence
- host
- session timeline

## Likely Root Causes (ranked)

## Immediate Containment

## Recovery Plan

## Hardening Plan
- now
- next
- later

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

50/100Analyzed last week

Framework-style skill for Pi host stability incidents. Provides structured triage workflow and output template but lacks specific commands/tools. Targets internal repo-specific paths (~/.pi/agent) making it project-specific. Has good "Use when" trigger and structured steps, but actionability limited by absence of executable commands. Tags (ci-cd, github-actions, testing) don't align well with the sysadmin-ops focus.

85
70
40
55
45

Metadata

Licenseunknown
Version-
Updated2 weeks ago
Publisherphrazzld

Tags

ci-cdgithub-actionstesting