Platxa K8s Operations

Automated Kubernetes operations for the Platxa platform with guided debugging workflows.

Overview

This skill provides operational commands for managing Platxa's Kubernetes infrastructure:

Category	Operations
Cluster	Setup, health check, node status
Instance	Status, logs, events, scale, wake, shell
Helm	Diff, sync, release status
Debug	Pod, network, storage, ingress diagnostics
Monitor	Service health, resource usage, alerts

Prerequisites

Required tools (verify with which <tool>):

kubectl - Kubernetes CLI (configured with cluster access)
helm - Helm package manager
helmfile - Declarative Helm releases

Verify cluster connectivity:

kubectl cluster-info
kubectl get nodes

Operations Reference

Cluster Operations

Operation	Command	Description
Setup Kind	`./install.sh kind`	Install local Kind cluster
Setup DOKS	`./install.sh doks`	Install DOKS production cluster
Health Check	`kubectl get pods -A`	Check all pod statuses
Node Status	`kubectl get nodes -o wide`	List nodes with resources

Instance Operations

Operation	Command	Description
List All	`kubectl get ns -l platxa.io/tier=instance`	List instance namespaces
Status	`kubectl get all -n instance-{name}`	Instance resources
Logs	`kubectl logs -n instance-{name} -l app=odoo --tail=100`	View logs
Events	`kubectl get events -n instance-{name} --sort-by='.lastTimestamp'`	Recent events
Scale Up	`kubectl scale deploy odoo-{name} -n instance-{name} --replicas=1`	Wake instance
Scale Down	`kubectl scale deploy odoo-{name} -n instance-{name} --replicas=0`	Sleep instance
Shell	`kubectl exec -n instance-{name} -it deploy/odoo-{name} -- /bin/bash`	Access pod

Infrastructure Operations

Operation	Command	Description
Helm Diff	`helmfile -e kind diff`	Preview changes
Helm Sync	`helmfile -e kind sync`	Apply releases
Release Status	`helm list -A`	All Helm releases

Workflow

When a user requests K8s operations, follow this workflow:

Step 1: Identify Operation Type

Detect intent from user message:

"debug", "not working", "help" → Instance debugging
"health", "status", "check" → Cluster health check
"deploy", "update", "sync" → Infrastructure update
"wake", "scale", "start" → Instance scaling
"logs", "events" → Instance investigation

Step 2: Gather Context

Before executing commands:

Identify target (instance name, namespace, release)
Verify current cluster context: kubectl config current-context
Check if target exists: kubectl get ns <namespace>

Step 3: Execute Operation

Run appropriate commands from Operations Reference above. Always show the command before executing for transparency.

Step 4: Analyze Results

Interpret output and identify:

Status indicators (Running, Pending, Error, CrashLoopBackOff)
Resource issues (OOMKilled, insufficient CPU/memory)
Configuration problems (missing secrets, wrong image)
Network issues (service unreachable, policy blocking)

Step 5: Recommend Actions

Based on findings:

Provide specific fix commands
Suggest follow-up checks
Link to relevant troubleshooting guide sections

Examples

Example 1: Debug Non-Responsive Instance

User: "The abc123xy instance isn't loading"

Response:

Checking instance status...

kubectl get all -n instance-abc123xy
# Output: deployment replicas=0

The instance is scaled to zero (sleeping). Waking it up:

kubectl scale deploy odoo-abc123xy -n instance-abc123xy --replicas=1
kubectl wait --for=condition=available deploy/odoo-abc123xy -n instance-abc123xy --timeout=120s

Instance is now running. Access at: https://abc123xy.platxa.com

Example 2: Pre-Deployment Health Check

User: "Is the cluster healthy before I deploy?"

Response:

Running cluster health check...

Nodes:
kubectl get nodes
# All nodes Ready ✓

Core Services:
kubectl get pods -n traefik-system
kubectl get pods -n postgres-system
kubectl get pods -n monitoring
# All pods Running ✓

Resource Availability:
kubectl top nodes
# CPU/Memory within limits ✓

Cluster is healthy. Safe to proceed with deployment.
Preview changes with: helmfile -e kind diff

Example 3: Investigate Pod CrashLoop

User: "Instance demo keeps restarting"

Response:

Investigating crash loop...

Pod Status:
kubectl get pods -n instance-demo -l app=odoo
# STATUS: CrashLoopBackOff, Restarts: 5

Recent Events:
kubectl get events -n instance-demo --sort-by='.lastTimestamp' | head -20
# Event: OOMKilled

Container Logs (last crash):
kubectl logs -n instance-demo -l app=odoo --previous --tail=50
# Error: Cannot allocate memory

Root Cause: Out of Memory (OOMKilled)

Fix Options:
1. Increase memory limit in instance tier
2. Reduce Odoo workers: kubectl edit configmap odoo-config -n instance-demo
3. Check for memory leaks in custom addons

Example 4: Helm Release Update

User: "Update the Traefik ingress controller"

Response:

Previewing Traefik update...

helmfile -e kind -l name=traefik diff

Changes:
- image: traefik:v2.10.5 -> traefik:v2.11.0
- Added new middleware configuration

Ready to apply? Running sync:

helmfile -e kind -l name=traefik sync

Verifying deployment:
kubectl rollout status deploy/traefik -n traefik-system

Traefik updated successfully. All ingress routes operational.

Error Handling

Connection Errors

Symptom: Unable to connect to the server

Causes:

Kubeconfig not set or expired
Cluster not running (Kind)
Network issues (DOKS)

Fix:

# Kind: Restart cluster
kind get clusters
kind export kubeconfig --name platxa

# DOKS: Refresh credentials
doctl kubernetes cluster kubeconfig save <cluster-id>

Permission Denied

Symptom: forbidden: User cannot <action>

Causes:

RBAC role not bound
Service account missing permissions

Fix: Check and apply RBAC:

kubectl auth can-i <verb> <resource> -n <namespace>
# If denied, apply appropriate RoleBinding

Resource Not Found

Symptom: NotFound: <resource> not found

Causes:

Wrong namespace
Resource deleted
Typo in name

Fix: Verify resource exists:

kubectl get <resource-type> -A | grep <name>
kubectl get ns | grep instance

Pod Stuck Pending

Symptom: Pod in Pending state

Causes:

Insufficient resources
PVC not bound
Node selector mismatch

Fix:

kubectl describe pod <pod> -n <namespace>
# Check Events section for scheduling failure reason

Safety

Read-Only Operations (Safe)

get, describe, logs, events - No cluster changes
diff - Preview only, no apply

Write Operations (Caution)

scale - Changes replica count
sync - Applies Helm releases
delete - Removes resources

Dangerous Operations (Require Confirmation)

kubectl delete ns - Deletes entire namespace
helmfile destroy - Removes all releases
kubectl drain - Evicts all pods from node

Always preview changes with diff before sync. Never run destructive commands without explicit user confirmation.

Output Checklist

After completing an operation, verify:

Command executed successfully (exit code 0)
Output analyzed and interpreted
Issues identified (if any)
Fix recommendations provided
Follow-up actions suggested
User can proceed with confidence

platxa-k8s-opsSafety --Repository

Package Files

Platxa K8s Operations

Overview

Prerequisites

Operations Reference

Cluster Operations

Instance Operations

Infrastructure Operations

Workflow

Step 1: Identify Operation Type

Step 2: Gather Context

Step 3: Execute Operation

Step 4: Analyze Results

Step 5: Recommend Actions

Examples

Example 1: Debug Non-Responsive Instance

Example 2: Pre-Deployment Health Check

Example 3: Investigate Pod CrashLoop

Example 4: Helm Release Update

Error Handling

Connection Errors

Permission Denied

Resource Not Found

Pod Stuck Pending

Safety

Read-Only Operations (Safe)

Write Operations (Caution)

Dangerous Operations (Require Confirmation)

Output Checklist

Install

AI Quality Score

Metadata

Tags

platxa-k8s-opsSafety --Repository ShareFavorite skill

Package Files

Platxa K8s Operations

Overview

Prerequisites

Operations Reference

Cluster Operations

Instance Operations

Infrastructure Operations

Workflow

Step 1: Identify Operation Type

Step 2: Gather Context

Step 3: Execute Operation

Step 4: Analyze Results

Step 5: Recommend Actions

Examples

Example 1: Debug Non-Responsive Instance

Example 2: Pre-Deployment Health Check

Example 3: Investigate Pod CrashLoop

Example 4: Helm Release Update

Error Handling

Connection Errors

Permission Denied

Resource Not Found

Pod Stuck Pending

Safety

Read-Only Operations (Safe)

Write Operations (Caution)

Dangerous Operations (Require Confirmation)

Output Checklist

Install

AI Quality Score

Metadata

Tags

platxa-k8s-opsSafety --Repository