Performance Engineer

Expert guidance for optimizing Urbit ship performance across system-level tuning, application profiling, capacity planning, and SLA compliance.

Performance Domains

1. System-Level

I/O Scheduler: Tuning for SSD/HDD
Kernel Parameters: sysctl optimization
Filesystem: mount options, alignment
CPU Affinity: Pinning ships to cores

2. Ship-Level

|pack/|meld: Scheduling and deduplication
Loom Sizing: Memory allocation for ships
App Suspension: Disabling resource-heavy agents
State Management: Reducing pier bloat

3. Infrastructure

Storage Selection: SSD type, IOPS, capacity
Network Tuning: TCP buffers, BBR, firewall
Resource Allocation: CPU cores, RAM limits
Monitoring: Metrics collection, alerting

4. Application

Gall Agent Optimization: Efficient state structures
Caching Strategies: In-memory hot data
Streaming: Processing large data incrementally
Jet Utilization: Using accelerated stdlib

Bottleneck Identification

Disk I/O Bottleneck

Symptoms:

iowait% >20% (sustained)
Slow |pack completion (>5 minutes for 5GB pier)
Lag in dojo responsiveness
High disk I/O in monitoring

Diagnostic:

iostat -x 5  # Check %util, await, svctm
iotop -oP  # Identify I/O-heavy processes
df -h /path/to/pier  # Disk space

Resolutions (ordered by effectiveness):

Upgrade to NVMe SSD: 10x-100x improvement over HDD
I/O scheduler: echo "none" > /sys/block/nvme0n1/queue/scheduler
Filesystem: Add noatime,nodiratime to /etc/fstab
Run |pack: Defragments pier, reduces I/O load
Reduce ship count: Distribute ships across servers

CPU Bottleneck

Symptoms:

Sustained >80% CPU usage
Slow event processing
Delayed |pack/|meld completion
High load average (>4 for quad-core)

Diagnostic:

top  # Check CPU usage per process
htop  # Visualize CPU distribution
# In dojo:
|mass  # Memory and CPU breakdown

Resolutions:

Suspend unused apps: |suspend %app-name in dojo
Increase CPU priority: Add Nice=-10 in systemd [Service]
CPU affinity: CPUAffinity=0-3 in systemd [Service]
Upgrade CPU: Faster clock speed (single-thread performance critical)
Distribute ships: Move to additional servers

Memory Bottleneck

Symptoms:

"out of loom" errors (memory limit exceeded)
Frequent |pack needed (weekly or daily)
OOM kills (system RAM exhaustion)
Swap usage >50%

Diagnostic:

free -h  # Check available RAM
# In dojo:
|mass  # Memory usage breakdown per app

Resolutions:

Run |meld: Deduplication requires 8GB+ RAM during execution
Increase loom: --loom 32 (4GB) or --loom 33 (8GB)
Suspend memory-heavy apps: Identify with |mass
Upgrade system RAM: 8GB minimum for comfortable operation
Weekly |pack: Preventive maintenance (reduces memory fragmentation)

Network Bottleneck

Symptoms:

Slow Ames communication (subscriptions delay >30s)
Delayed message delivery
High network latency
Packet loss

Diagnostic:

ping -c 100 8.8.8.8  # Check latency
iftop  # Monitor bandwidth usage
# In dojo:
|hi ~zod  # Test Ames connectivity
# Check firewall:
sudo ufw status

Resolutions:

Firewall: Verify UDP 34543 open: sudo ufw allow 34543/udp
TCP tuning: Increase buffer sizes in /etc/sysctl.conf
Bandwidth: Upgrade VPS plan if saturated
Geographic proximity: Deploy VPS closer to users
BBR congestion control: net.ipv4.tcp_congestion_control=bbr

Performance Tuning

Essential Tuning

SSD Storage

Type: NVMe preferred
I/O Scheduler: none (for NVMe), deadline (for SATA SSD)
Mount Options: noatime,nodiratime
Alignment: 4K sectors (align with SSD blocks)

Filesystem

# /etc/fstab
/dev/nvme0n1p1 /path/to/pier ext4 defaults,noatime,nodiratime 0 0

Kernel Parameters

# /etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_congestion_control = bbr
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 3

Loom Sizing

# 4GB RAM: --loom 32
# 8GB+ RAM: --loom 33
urbit --loom 32 /path/to/pier

CPU Priority

# systemd service file
[Service]
Nice=-10
# For multi-core:
CPUAffinity=0-3

Preventive Maintenance

Weekly |pack

# cron: 0 2 * * 0 (2 AM every Sunday)
/path/to/urbit -F /path/to/pier/.urb/put | pack &

Monthly |meld (8GB+ RAM only)

# cron: 0 3 1 * * (3 AM on 1st of month)
/path/to/urbit -F /path/to/pier/.urb/put | meld &

Automated Backup

# cron: 0 4 * * * (4 AM daily)
#!/bin/bash
DATE=$(date +%Y%m%d)
urbit -I 30 /path/to/pier/.urb/put | pack &
# Wait for ship to stop
tar -czf /backup/pier-${DATE}.tar.gz /path/to/pier
urbit /path/to/pier &

Capacity Planning

Growth Projections

# Calculate monthly pier growth
CURRENT=$(du -sb /path/to/pier | awk '{print $1}')
PREVIOUS=$(cat /var/log/pier-size-30d-ago.txt)
GROWTH=$((CURRENT - PREVIOUS))
MONTHLY_GROWTH_GB=$((GROWTH / 1073741824))

echo "Monthly growth: ${MONTHLY_GROWTH_GB}GB"

# Project remaining capacity
DISK_FREE=$(df /path/to/pier | tail -1 | awk '{print $4}')
MONTHS_REMAINING=$((DISK_FREE / GROWTH))
echo "Months until disk full: $MONTHS_REMAINING"

Scaling Triggers

Vertical scaling (upgrade current VPS):

RAM utilization >80% for 7+ days
Disk >85% full
CPU >80% sustained
Response time >2s (95th percentile)

Horizontal scaling (add servers):

Managing 10+ ships on single VPS
Need geographic distribution
Need to isolate critical vs non-critical ships
Single point of failure concern

SLA Targets

Availability

Target: 99.9% uptime (8.76 hours downtime/year)
Measurement: uptime, systemd logs, external monitoring
Improvements: Automated monitoring, alerting, redundancy, failover

Response Time

Target: Dojo command response <1 second (95th percentile)
Measurement: Manual testing, custom instrumentation, monitoring dashboards
Improvements: |pack, SSD upgrade, RAM increase, app optimization

Pier Size Growth

Target: <10GB growth per month for typical ship
Measurement: du -sh /path/to/pier (tracked weekly)
Improvements: |pack (weekly), |meld (monthly), app cleanup, state migration

Monitoring Setup

Prometheus Node Exporter

# Install node exporter
curl -fsSL https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz | tar xz
cd node_exporter-1.6.1.linux-amd64
./node_exporter --web.listen-address=:9100

Ship Metrics

# prometheus.yml
scrape_configs:
  - job_name: 'urbit-ship'
    static_configs:
      - targets: ['localhost:9100']
    metrics_path: '/metrics'

Grafana Dashboard

CPU Usage: System-wide and per-process
Memory Usage: System RAM, loom usage, swap
Disk I/O: %util, await, svctm, read/write rates
Network: Bandwidth, latency, packet loss
Ship-specific: Pier size, event throughput, agent state

Best Practices

Measure first: Establish baseline before optimizing
One change at a time: Isolate what improves performance
SSD mandatory: HDD too slow for production Urbit
Preventive |pack: Weekly automation prevents emergencies
Monitor trends: Track pier size, CPU, RAM over time
Document changes: Record all tuning parameters in config files
Test thoroughly: Verify ship functionality after changes
Plan capacity: Scale before hitting limits (not after)
Automate maintenance: Cron jobs for |pack, backups, monitoring
Hardware > software: Sometimes upgrading hardware is better than optimization

Performance Optimization Checklist

Storage

NVMe SSD (or high-quality SATA SSD)
I/O scheduler: 'none' for NVMe, 'deadline' for SATA
Filesystem: noatime,nodiratime options
Adequate disk space: <85% usage

Memory

8GB+ RAM (recommended)
Loom sized: --loom 32 (4GB) or --loom 33 (8GB+)
Weekly |pack scheduled
Monthly |meld (if 8GB+ RAM)
Swap: swappiness=10 (or disabled if 8GB+)

CPU

Appropriate CPU: >=2 cores recommended
Nice=-10 for urbit service
CPU affinity: Pin to specific cores
Unused apps suspended via |mass

Network

Firewall: UDP 34543 open
TCP tuning: rmem_max, wmem_max, congestion_control
Geographic proximity: VPS near users
Adequate bandwidth: Monitor saturation

Monitoring

Prometheus node exporter installed
Grafana dashboard configured
Alerting rules configured
Weekly capacity projections reviewed
SLA targets documented

performance-engineer-assistantSafety 95Repository ShareFavorite skill

Package Files

Performance Engineer

Performance Domains

1. System-Level

2. Ship-Level

3. Infrastructure

4. Application

Bottleneck Identification

Disk I/O Bottleneck

CPU Bottleneck

Memory Bottleneck

Network Bottleneck

Performance Tuning

Essential Tuning

SSD Storage

Filesystem

Kernel Parameters

Loom Sizing

CPU Priority

Preventive Maintenance

Weekly |pack

Monthly |meld (8GB+ RAM only)

Automated Backup

Capacity Planning

Growth Projections

Scaling Triggers

SLA Targets

Availability

Response Time

Pier Size Growth

Monitoring Setup

Prometheus Node Exporter

Ship Metrics

Grafana Dashboard

Best Practices

Performance Optimization Checklist

Storage

Memory

CPU

Network

Monitoring

Resources

Install

AI Quality Score

Metadata

Tags

performance-engineer-assistantSafety 95Repository