Skills Security Audit
The Audit Mindset
Treat skills as dependencies. Shell instructions in hooks run before the model reasons about output. A malicious skill has the same access as any npm package you'd npm install - except it runs with your permissions and sees your conversation context.
Trust nothing, verify everything. Even skills from "reputable" sources can be:
- Compromised via supply chain attacks
- Contain vulnerabilities that enable exploitation
- Have overly broad permissions that create risk
Quick Reference: Detection Categories
| Category | Severity | What to Look For |
|---|---|---|
| Network Exfiltration | CRITICAL | curl, wget, nc, DNS lookups, base64 in URLs |
| Lateral Movement | CRITICAL | SSH config, scp, rsync, ~/.ssh/* access |
| Credential Harvesting | CRITICAL | .env reading, keychain, AWS/GCP creds |
| Prompt Injection | CRITICAL | System prompt overrides, safety bypass |
| Persistence | HIGH | cron, launchd, .bashrc mods, startup items |
| MCP Server Risks | HIGH | Untrusted servers, tool shadowing |
| Data Staging | HIGH | Archive creation, temp dir ops, clipboard |
| Obfuscated Code | HIGH | Base64/hex encoding, dynamic code execution, minified |
| Shell Execution | MEDIUM | Unrestricted bash, command injection |
| File System Scope | MEDIUM | Broad globs, parent traversal |
| Permission Scope | LOW | Permissions exceeding stated purpose |
Audit Workflow
Phase 1: Inventory
First, understand what you're auditing:
# List all files in the skill/plugin
find <skill-path> -type f | head -100
# Identify file types
find <skill-path> -type f -exec file {} \;
# Check for binaries (immediate concern)
find <skill-path> -type f \( -perm -u+x -o -name "*.so" -o -name "*.dylib" -o -name "*.exe" \)
Red flags at this stage:
- Binary/compiled files (why would a skill need these?)
- Unusual file extensions
- Hidden files (.hidden)
- Symlinks to system directories
Phase 2: Static Analysis
Scan for dangerous patterns. See references/detection-patterns.md for complete patterns.
Critical patterns to grep:
# Network exfiltration
grep -rn "curl\|wget\|nc \|netcat\|/dev/tcp\|/dev/udp" <skill-path>
# Credential access
grep -rn "\.env\|AWS_\|OPENAI_API\|ssh/\|\.ssh\|keychain\|credentials" <skill-path>
# Obfuscation
grep -rn "base64\|\\\\x[0-9a-f]" <skill-path>
# Persistence
grep -rn "crontab\|launchd\|\.bashrc\|\.zshrc\|startup\|autorun" <skill-path>
For MCP servers, also check:
- What servers are configured?
- Are they from known/trusted sources?
- What tools do they expose?
Phase 3: Behavioral Analysis
Trace what happens when the skill activates:
-
Hook Analysis: Check for PreToolUse, PostToolUse, Stop, SessionStart hooks
- What commands do they run?
- Do they capture/transmit data?
-
File Operations: What files does the skill read/write?
- Does it access files outside its directory?
- Does it create files in unexpected locations?
-
Network Behavior: Does it make network requests?
- To what domains?
- With what data?
-
Environment Access: Does it read environment variables?
- Which ones?
- What does it do with them?
Phase 4: Trust Analysis
Evaluate the supply chain:
-
Source Verification
- Where did this skill come from?
- Is the source reputable?
- Can you verify the author?
-
Dependency Check
- Does it fetch external code at runtime?
- Does it reference git repos, npm packages?
- Are those dependencies trustworthy?
-
Permission Audit
- What permissions does it request?
- Do those permissions match its stated purpose?
- Is it overly broad?
-
MCP Server Trust (see
references/mcp-risks.md)- Are MCP servers from known sources?
- Do they request appropriate permissions?
- Could they shadow built-in tools?
Phase 5: Report Generation
Generate a structured report:
## Security Audit Report: [skill-name]
**Audit Date:** YYYY-MM-DD
**Auditor:** Claude Code Security Audit Skill
**Risk Level:** CRITICAL | HIGH | MEDIUM | LOW | CLEAN
### Executive Summary
[One paragraph summary of findings and recommendation]
### Critical Findings
[For each critical finding:]
- **[CRITICAL] [Category]:** [Description]
- Evidence: `[file:line]` - `[code snippet]`
- Risk: [What could happen if exploited]
- Remediation: [How to fix or mitigate]
### High Findings
[Same format as critical]
### Medium Findings
[Same format]
### Low Findings
[Same format]
### Files Analyzed
- [List of all files examined]
### Patterns Checked
- [List of detection patterns applied]
### Recommendation
[ ] SAFE TO USE - No significant issues found
[ ] USE WITH CAUTION - Minor issues, monitor behavior
[ ] REQUIRES REMEDIATION - Fix issues before use
[ ] DO NOT USE - Critical security risks identified
Red Flags: Immediate Rejection
These findings should result in immediate CRITICAL rating and recommendation to NOT USE:
- Any curl/wget to non-localhost URLs - Why does a skill need to phone home?
- Any access to ~/.ssh/ or credential files - No legitimate reason for this
- Base64-encoded shell commands - Classic obfuscation technique
- MCP servers from unknown sources - Unverified code execution
- Instructions to "ignore safety" or "override system prompt" - Prompt injection
- Dynamic code execution of external content - Code injection vector
- Writing to .bashrc/.zshrc or cron - Persistence mechanism
Quick Scan Command
For a fast initial scan, use the quick-scan script:
${CLAUDE_PLUGIN_ROOT}/skills/audit/scripts/quick-scan.sh <skill-path>
This performs basic pattern matching and reports potential issues. Follow up with manual review for any findings.
Reference Documents
references/detection-patterns.md- Complete grep patterns for all categoriesreferences/mcp-risks.md- MCP-specific threat model and detectionreferences/prompt-injection.md- Prompt injection detection techniques
Examples
examples/malicious-skill/- Example malicious skill demonstrating attack patternsexamples/clean-skill/- Example clean skill following best practices
Use these for testing and comparison during audits.
