askill
agent-troubleshooting

agent-troubleshootingSafety 95Repository

Diagnoses Buildkite agent issues like stuck jobs and queue mismatches.

0 stars
1.2k downloads
Updated 1/29/2026

Package Files

Loading files...
SKILL.md

Agent Troubleshooting

Diagnose why jobs aren't being picked up by agents.

When to use

  • "My build is stuck waiting for an agent"
  • "Jobs aren't being picked up"
  • "Why is my build stuck in scheduled?"
  • "Agent not running my job"
  • "Queue issues"
  • "No agents available"

Available MCP Tools

ToolPurpose
get_buildGet job details including agent requirements
list_clustersList available clusters
get_clusterGet detailed cluster information
list_cluster_queuesList queues in a cluster
get_cluster_queueGet queue stats (agent count, jobs waiting)

Input Parsing

User typically describes a symptom:

InputLikely Issue
"build stuck"Job in scheduled state
"waiting for agent"No matching agents
"job not starting"Agent configuration mismatch
"queue problem"Queue doesn't exist or no agents

Get the build number/URL to investigate.

Approach

  1. Get the build with buildkite_get_build

    • Find the stuck job
    • Note its state (scheduled, assigned, etc.)
    • Extract agent query rules (queue, tags)
  2. Check cluster/queue configuration

    • List clusters with buildkite_list_clusters
    • List queues with buildkite_list_cluster_queues
    • Get queue stats with buildkite_get_cluster_queue
  3. Compare requirements vs availability

    • What does the job require?
    • What agents/queues exist?
    • Where's the mismatch?
  4. Provide diagnosis and fix

Job States for Agent Issues

StateMeaningIndicates
scheduledWaiting for agentNo matching agent available
assignedAgent acceptedAgent has it but not starting
acceptedAgent startingShould run soon

Jobs stuck in scheduled = agent matching problem.

Common Issues

1. Queue Mismatch

Symptom: Job stuck in scheduled Cause: Job requires queue that doesn't exist or has no agents

# Pipeline requires:
agents:
  queue: "deploy"

# But no agents are in the "deploy" queue

Diagnosis:

Job requires: queue=deploy
Available queues: default (5 agents), build (10 agents)
❌ No "deploy" queue exists

Fix: Add agents to the deploy queue, or change pipeline to use existing queue.

2. Tag Mismatch

Symptom: Job stuck in scheduled Cause: Job requires tags no agent has

# Pipeline requires:
agents:
  queue: "default"
  docker: "true"
  os: "linux"

# Agents have docker=true but os=macos

Diagnosis:

Job requires: queue=default, docker=true, os=linux
Available agents in default:
  - agent-1: docker=true, os=macos
  - agent-2: docker=true, os=macos
❌ No agent matches os=linux

Fix: Add Linux agents, or remove the os requirement.

3. No Agents Running

Symptom: Job stuck in scheduled Cause: Queue exists but no agents connected

Diagnosis:

Job requires: queue=deploy
Queue "deploy" exists but has 0 connected agents

Fix: Start agents, check agent host health, verify network connectivity.

4. All Agents Busy

Symptom: Job stuck in scheduled longer than usual Cause: Agents exist but at capacity

Diagnosis:

Job requires: queue=default
Queue "default": 3 agents, 15 jobs waiting
Average wait time: 12 minutes

Fix: Scale up agents, reduce parallelism, or wait.

5. Agent Assigned But Not Starting

Symptom: Job stuck in assigned state Cause: Agent accepted job but can't start it

Possible causes:

  • Agent hooks failing (environment, pre-command)
  • Plugin installation failing
  • Disk space issues
  • Agent process problems

Fix: Check agent logs on the host machine.

Response Format

## Agent Issue Diagnosed

**Build**: #456
**Stuck Job**: "Run Tests"
**State**: scheduled (waiting for agent)

### Job Requirements
- Queue: `deploy`
- Tags: `docker=true`

### Available Resources
- Queue `deploy`: ❌ Does not exist
- Queue `default`: 5 agents (none match)

### Root Cause
The job requires `queue=deploy` but no such queue exists in your cluster.

### Fix
**Immediate**: Change the pipeline to use `queue=default`:
```yaml
agents:
  queue: "default"
  docker: "true"
```

**Long-term**: Create a `deploy` queue and add dedicated agents for deployments.

Diagnostic Commands

When explaining fixes, reference these Buildkite agent commands:

# Check agent status
buildkite-agent status

# See what queues/tags an agent has
buildkite-agent start --tags "queue=deploy,docker=true"

# Check agent logs
journalctl -u buildkite-agent

Example Interaction

User: My build is stuck waiting for an agent

1. Ask for build URL/number
2. Fetch build, find stuck job in "scheduled" state
3. Extract agent requirements: queue=special, gpu=true
4. List queues - "special" exists with 2 agents
5. Check queue details - agents have gpu=false
6. Explain: "Job needs gpu=true but queue agents don't have GPU tag"
7. Suggest: Add GPU agents or modify job requirements

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

89/100Analyzed 2/19/2026

High-quality, comprehensive skill for diagnosing Buildkite agent issues. Excellent structure with clear sections, useful tables, concrete examples, and step-by-step guidance. Covers 5 common issues with diagnosis patterns and fixes. Well-suited for reusable CI/CD troubleshooting. Minor扣分 for being Buildkite-specific but otherwise excellent."

95
92
78
92
90

Metadata

Licenseunknown
Version-
Updated1/29/2026
Publishermcncl

Tags

ci-cd