askill
platxa-k8s-ops

platxa-k8s-opsSafety --Repository

Kubernetes operations automation for Platxa platform. Debug instances, manage clusters, scale deployments, and perform infrastructure operations with guided workflows.

4 stars
1.2k downloads
Updated 2/11/2026

Package Files

Loading files...
SKILL.md

Platxa K8s Operations

Automated Kubernetes operations for the Platxa platform with guided debugging workflows.

Overview

This skill provides operational commands for managing Platxa's Kubernetes infrastructure:

CategoryOperations
ClusterSetup, health check, node status
InstanceStatus, logs, events, scale, wake, shell
HelmDiff, sync, release status
DebugPod, network, storage, ingress diagnostics
MonitorService health, resource usage, alerts

Prerequisites

Required tools (verify with which <tool>):

  • kubectl - Kubernetes CLI (configured with cluster access)
  • helm - Helm package manager
  • helmfile - Declarative Helm releases

Verify cluster connectivity:

kubectl cluster-info
kubectl get nodes

Operations Reference

Cluster Operations

OperationCommandDescription
Setup Kind./install.sh kindInstall local Kind cluster
Setup DOKS./install.sh doksInstall DOKS production cluster
Health Checkkubectl get pods -ACheck all pod statuses
Node Statuskubectl get nodes -o wideList nodes with resources

Instance Operations

OperationCommandDescription
List Allkubectl get ns -l platxa.io/tier=instanceList instance namespaces
Statuskubectl get all -n instance-{name}Instance resources
Logskubectl logs -n instance-{name} -l app=odoo --tail=100View logs
Eventskubectl get events -n instance-{name} --sort-by='.lastTimestamp'Recent events
Scale Upkubectl scale deploy odoo-{name} -n instance-{name} --replicas=1Wake instance
Scale Downkubectl scale deploy odoo-{name} -n instance-{name} --replicas=0Sleep instance
Shellkubectl exec -n instance-{name} -it deploy/odoo-{name} -- /bin/bashAccess pod

Infrastructure Operations

OperationCommandDescription
Helm Diffhelmfile -e kind diffPreview changes
Helm Synchelmfile -e kind syncApply releases
Release Statushelm list -AAll Helm releases

Workflow

When a user requests K8s operations, follow this workflow:

Step 1: Identify Operation Type

Detect intent from user message:

  • "debug", "not working", "help" → Instance debugging
  • "health", "status", "check" → Cluster health check
  • "deploy", "update", "sync" → Infrastructure update
  • "wake", "scale", "start" → Instance scaling
  • "logs", "events" → Instance investigation

Step 2: Gather Context

Before executing commands:

  1. Identify target (instance name, namespace, release)
  2. Verify current cluster context: kubectl config current-context
  3. Check if target exists: kubectl get ns <namespace>

Step 3: Execute Operation

Run appropriate commands from Operations Reference above. Always show the command before executing for transparency.

Step 4: Analyze Results

Interpret output and identify:

  • Status indicators (Running, Pending, Error, CrashLoopBackOff)
  • Resource issues (OOMKilled, insufficient CPU/memory)
  • Configuration problems (missing secrets, wrong image)
  • Network issues (service unreachable, policy blocking)

Step 5: Recommend Actions

Based on findings:

  • Provide specific fix commands
  • Suggest follow-up checks
  • Link to relevant troubleshooting guide sections

Examples

Example 1: Debug Non-Responsive Instance

User: "The abc123xy instance isn't loading"

Response:

Checking instance status...

kubectl get all -n instance-abc123xy
# Output: deployment replicas=0

The instance is scaled to zero (sleeping). Waking it up:

kubectl scale deploy odoo-abc123xy -n instance-abc123xy --replicas=1
kubectl wait --for=condition=available deploy/odoo-abc123xy -n instance-abc123xy --timeout=120s

Instance is now running. Access at: https://abc123xy.platxa.com

Example 2: Pre-Deployment Health Check

User: "Is the cluster healthy before I deploy?"

Response:

Running cluster health check...

Nodes:
kubectl get nodes
# All nodes Ready ✓

Core Services:
kubectl get pods -n traefik-system
kubectl get pods -n postgres-system
kubectl get pods -n monitoring
# All pods Running ✓

Resource Availability:
kubectl top nodes
# CPU/Memory within limits ✓

Cluster is healthy. Safe to proceed with deployment.
Preview changes with: helmfile -e kind diff

Example 3: Investigate Pod CrashLoop

User: "Instance demo keeps restarting"

Response:

Investigating crash loop...

Pod Status:
kubectl get pods -n instance-demo -l app=odoo
# STATUS: CrashLoopBackOff, Restarts: 5

Recent Events:
kubectl get events -n instance-demo --sort-by='.lastTimestamp' | head -20
# Event: OOMKilled

Container Logs (last crash):
kubectl logs -n instance-demo -l app=odoo --previous --tail=50
# Error: Cannot allocate memory

Root Cause: Out of Memory (OOMKilled)

Fix Options:
1. Increase memory limit in instance tier
2. Reduce Odoo workers: kubectl edit configmap odoo-config -n instance-demo
3. Check for memory leaks in custom addons

Example 4: Helm Release Update

User: "Update the Traefik ingress controller"

Response:

Previewing Traefik update...

helmfile -e kind -l name=traefik diff

Changes:
- image: traefik:v2.10.5 -> traefik:v2.11.0
- Added new middleware configuration

Ready to apply? Running sync:

helmfile -e kind -l name=traefik sync

Verifying deployment:
kubectl rollout status deploy/traefik -n traefik-system

Traefik updated successfully. All ingress routes operational.

Error Handling

Connection Errors

Symptom: Unable to connect to the server

Causes:

  • Kubeconfig not set or expired
  • Cluster not running (Kind)
  • Network issues (DOKS)

Fix:

# Kind: Restart cluster
kind get clusters
kind export kubeconfig --name platxa

# DOKS: Refresh credentials
doctl kubernetes cluster kubeconfig save <cluster-id>

Permission Denied

Symptom: forbidden: User cannot <action>

Causes:

  • RBAC role not bound
  • Service account missing permissions

Fix: Check and apply RBAC:

kubectl auth can-i <verb> <resource> -n <namespace>
# If denied, apply appropriate RoleBinding

Resource Not Found

Symptom: NotFound: <resource> not found

Causes:

  • Wrong namespace
  • Resource deleted
  • Typo in name

Fix: Verify resource exists:

kubectl get <resource-type> -A | grep <name>
kubectl get ns | grep instance

Pod Stuck Pending

Symptom: Pod in Pending state

Causes:

  • Insufficient resources
  • PVC not bound
  • Node selector mismatch

Fix:

kubectl describe pod <pod> -n <namespace>
# Check Events section for scheduling failure reason

Safety

Read-Only Operations (Safe)

  • get, describe, logs, events - No cluster changes
  • diff - Preview only, no apply

Write Operations (Caution)

  • scale - Changes replica count
  • sync - Applies Helm releases
  • delete - Removes resources

Dangerous Operations (Require Confirmation)

  • kubectl delete ns - Deletes entire namespace
  • helmfile destroy - Removes all releases
  • kubectl drain - Evicts all pods from node

Always preview changes with diff before sync. Never run destructive commands without explicit user confirmation.

Output Checklist

After completing an operation, verify:

  • Command executed successfully (exit code 0)
  • Output analyzed and interpreted
  • Issues identified (if any)
  • Fix recommendations provided
  • Follow-up actions suggested
  • User can proceed with confidence

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

AI review pending.

Metadata

Licenseunknown
Version-
Updated2/11/2026
Publisherplatxa

Tags

automationdevopsinfrastructurekubernetes