askill
k8s-troubleshoot

k8s-troubleshootSafety 95Repository

Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.

818 stars
16.4k downloads
Updated 2/10/2026

Package Files

Loading files...
SKILL.md

Kubernetes Troubleshooting

Expert debugging and diagnostics for Kubernetes clusters using kubectl-mcp-server tools.

When to Apply

Use this skill when:

  • User mentions: "debug", "troubleshoot", "diagnose", "failing", "crash", "not starting", "broken"
  • Pod states: Pending, CrashLoopBackOff, ImagePullBackOff, OOMKilled, Error, Unknown
  • Node issues: NotReady, MemoryPressure, DiskPressure, NetworkUnavailable, PIDPressure
  • Keywords: "logs", "events", "describe", "why isn't working", "stuck", "not responding"

Priority Rules

PriorityRuleImpactTools
1Check pod status firstCRITICALget_pods, describe_pod
2View recent eventsCRITICALget_events
3Inspect logs (including previous)HIGHget_pod_logs
4Check resource metricsHIGHget_pod_metrics
5Verify endpointsMEDIUMget_endpoints
6Review network policiesMEDIUMget_network_policies
7Examine node statusLOWget_nodes, describe_node

Quick Reference

SymptomFirst ToolNext Steps
Pod Pendingdescribe_podCheck events, node capacity, resource requests
CrashLoopBackOffget_pod_logs(previous=True)Check exit code, resources, liveness probes
ImagePullBackOffdescribe_podVerify image name, registry auth, network
OOMKilledget_pod_metricsIncrease memory limits, check for memory leaks
ContainerCreatingdescribe_podCheck PVC binding, secrets, configmaps
Terminating (stuck)describe_podCheck finalizers, PDBs, preStop hooks

Diagnostic Workflows

Pod Not Starting

1. get_pods(namespace, label_selector) - Get pod status
2. describe_pod(name, namespace) - See events and conditions
3. get_events(namespace, field_selector="involvedObject.name=<pod>") - Check events
4. get_pod_logs(name, namespace, previous=True) - For crash loops

Common Pod States

StateLikely CauseTools to Use
PendingScheduling issuesdescribe_pod, get_nodes, get_events
ImagePullBackOffRegistry/authdescribe_pod, check image name
CrashLoopBackOffApp crashget_pod_logs(previous=True)
OOMKilledMemory limitget_pod_metrics, adjust limits
ContainerCreatingVolume/networkdescribe_pod, get_pvc

Node Issues

1. get_nodes() - List nodes and status
2. describe_node(name) - See conditions and capacity
3. Check: Ready, MemoryPressure, DiskPressure, PIDPressure
4. node_logs_tool(name, "kubelet") - Kubelet logs

Deep Debugging Workflows

CrashLoopBackOff Investigation

1. get_pod_logs(name, namespace, previous=True) - See why it crashed
2. describe_pod(name, namespace) - Check resource limits, probes
3. get_pod_metrics(name, namespace) - Memory/CPU at crash time
4. If OOM: compare requests/limits to actual usage
5. If app error: check logs for stack trace

Networking Issues

1. get_services(namespace) - Verify service exists
2. get_endpoints(namespace) - Check endpoint backends
3. If empty endpoints: pods don't match selector
4. get_network_policies(namespace) - Check traffic rules
5. For Cilium: cilium_endpoints_list_tool(), hubble_flows_query_tool()

Storage Problems

1. get_pvc(namespace) - Check PVC status
2. describe_pvc(name, namespace) - See binding issues
3. get_storage_classes() - Verify provisioner exists
4. If Pending: check storage class, access modes

DNS Resolution

1. kubectl_exec(pod, namespace, "nslookup kubernetes.default") - Test DNS
2. If fails: check coredns pods in kube-system
3. get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
4. get_pod_logs(name="coredns-*", namespace="kube-system")

Multi-Cluster Debugging

All tools support context parameter for targeting different clusters:

get_pods(namespace="kube-system", context="production-cluster")
get_events(namespace="default", context="staging-cluster")
describe_pod(name="myapp-xyz", namespace="prod", context="prod-east")

Diagnostic Scripts

For comprehensive diagnostics, run the bundled scripts:

Decision Tree

See references/DECISION-TREE.md for visual troubleshooting flowcharts.

Common Errors Reference

See references/COMMON-ERRORS.md for error message explanations and fixes.

Related Tools

Core Diagnostics

  • get_pods, describe_pod, get_pod_logs, get_pod_metrics
  • get_events, get_nodes, describe_node
  • get_resource_usage, compare_namespaces

Advanced (Ecosystem)

  • Cilium: cilium_endpoints_list_tool, hubble_flows_query_tool
  • Istio: istio_proxy_status_tool, istio_analyze_tool

Related Skills

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

92/100Analyzed 2/19/2026

High-quality technical reference skill for Kubernetes troubleshooting with excellent structure, comprehensive coverage of diagnostic scenarios (pods, nodes, networking, storage, DNS), clear priority rules, and actionable workflows. Includes metadata, tags, and proper frontmatter. Well-suited for reuse across K8s environments despite being tied to a specific tool implementation."

95
95
90
90
85

Metadata

Licenseunknown
Version-
Updated2/10/2026
Publisherrohitg00

Tags

observabilitysecuritytesting