Agent Troubleshooting
Diagnose why jobs aren't being picked up by agents.
When to use
- "My build is stuck waiting for an agent"
- "Jobs aren't being picked up"
- "Why is my build stuck in scheduled?"
- "Agent not running my job"
- "Queue issues"
- "No agents available"
Available MCP Tools
| Tool | Purpose |
|---|---|
get_build | Get job details including agent requirements |
list_clusters | List available clusters |
get_cluster | Get detailed cluster information |
list_cluster_queues | List queues in a cluster |
get_cluster_queue | Get queue stats (agent count, jobs waiting) |
Input Parsing
User typically describes a symptom:
| Input | Likely Issue |
|---|---|
| "build stuck" | Job in scheduled state |
| "waiting for agent" | No matching agents |
| "job not starting" | Agent configuration mismatch |
| "queue problem" | Queue doesn't exist or no agents |
Get the build number/URL to investigate.
Approach
-
Get the build with
buildkite_get_build- Find the stuck job
- Note its state (scheduled, assigned, etc.)
- Extract agent query rules (queue, tags)
-
Check cluster/queue configuration
- List clusters with
buildkite_list_clusters - List queues with
buildkite_list_cluster_queues - Get queue stats with
buildkite_get_cluster_queue
- List clusters with
-
Compare requirements vs availability
- What does the job require?
- What agents/queues exist?
- Where's the mismatch?
-
Provide diagnosis and fix
Job States for Agent Issues
| State | Meaning | Indicates |
|---|---|---|
scheduled | Waiting for agent | No matching agent available |
assigned | Agent accepted | Agent has it but not starting |
accepted | Agent starting | Should run soon |
Jobs stuck in scheduled = agent matching problem.
Common Issues
1. Queue Mismatch
Symptom: Job stuck in scheduled Cause: Job requires queue that doesn't exist or has no agents
# Pipeline requires:
agents:
queue: "deploy"
# But no agents are in the "deploy" queue
Diagnosis:
Job requires: queue=deploy
Available queues: default (5 agents), build (10 agents)
❌ No "deploy" queue exists
Fix: Add agents to the deploy queue, or change pipeline to use existing queue.
2. Tag Mismatch
Symptom: Job stuck in scheduled Cause: Job requires tags no agent has
# Pipeline requires:
agents:
queue: "default"
docker: "true"
os: "linux"
# Agents have docker=true but os=macos
Diagnosis:
Job requires: queue=default, docker=true, os=linux
Available agents in default:
- agent-1: docker=true, os=macos
- agent-2: docker=true, os=macos
❌ No agent matches os=linux
Fix: Add Linux agents, or remove the os requirement.
3. No Agents Running
Symptom: Job stuck in scheduled Cause: Queue exists but no agents connected
Diagnosis:
Job requires: queue=deploy
Queue "deploy" exists but has 0 connected agents
Fix: Start agents, check agent host health, verify network connectivity.
4. All Agents Busy
Symptom: Job stuck in scheduled longer than usual Cause: Agents exist but at capacity
Diagnosis:
Job requires: queue=default
Queue "default": 3 agents, 15 jobs waiting
Average wait time: 12 minutes
Fix: Scale up agents, reduce parallelism, or wait.
5. Agent Assigned But Not Starting
Symptom: Job stuck in assigned state Cause: Agent accepted job but can't start it
Possible causes:
- Agent hooks failing (environment, pre-command)
- Plugin installation failing
- Disk space issues
- Agent process problems
Fix: Check agent logs on the host machine.
Response Format
## Agent Issue Diagnosed
**Build**: #456
**Stuck Job**: "Run Tests"
**State**: scheduled (waiting for agent)
### Job Requirements
- Queue: `deploy`
- Tags: `docker=true`
### Available Resources
- Queue `deploy`: ❌ Does not exist
- Queue `default`: 5 agents (none match)
### Root Cause
The job requires `queue=deploy` but no such queue exists in your cluster.
### Fix
**Immediate**: Change the pipeline to use `queue=default`:
```yaml
agents:
queue: "default"
docker: "true"
```
**Long-term**: Create a `deploy` queue and add dedicated agents for deployments.
Diagnostic Commands
When explaining fixes, reference these Buildkite agent commands:
# Check agent status
buildkite-agent status
# See what queues/tags an agent has
buildkite-agent start --tags "queue=deploy,docker=true"
# Check agent logs
journalctl -u buildkite-agent
Example Interaction
User: My build is stuck waiting for an agent
1. Ask for build URL/number
2. Fetch build, find stuck job in "scheduled" state
3. Extract agent requirements: queue=special, gpu=true
4. List queues - "special" exists with 2 agents
5. Check queue details - agents have gpu=false
6. Explain: "Job needs gpu=true but queue agents don't have GPU tag"
7. Suggest: Add GPU agents or modify job requirements
