askill
gke-workload-scaling

gke-workload-scalingSafety 85Repository

Specific workflows for scaling GKE workloads using HPA and VPA, as well as best practices for autoscaling configuration.

121 stars
2.4k downloads
Updated 2/6/2026

Package Files

Loading files...
SKILL.md

GKE Workload Scaling

This skill provides workflows and best practices for scaling applications on Google Kubernetes Engine (GKE). It covers manual scaling, Horizontal Pod Autoscaling (HPA), and Vertical Pod Autoscaling (VPA).

Workflows

1. Manual Scaling

Quickly scale a deployment to a fixed number of replicas. Useful for immediate manual intervention or testing.

Command:

kubectl scale deployment <deployment-name> --replicas=<number> -n <namespace>

2. Horizontal Pod Autoscaling (HPA)

Automatically scale the number of pods based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

  • Metrics Server must be running (enabled by default on GKE).
  • Containers clearly define resource requests/limits.

Quick Command:

kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10

Manifest Approach (Recommended): Use a YAML manifest for version-controlled configuration. See assets/hpa-example.yaml for a template.

kubectl apply -f assets/hpa-example.yaml

3. Vertical Pod Autoscaling (VPA)

Automatically adjust the CPU and memory reservations for your pods to match actual usage. This is critical for right-sizing workloads.

Prerequisites:

  • VPA must be enabled on the cluster.
    • Autopilot: Enabled by default.
    • Standard: Must be enabled manually.

Enable VPA on Standard Cluster:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling --zone <zone>

Update Modes:

  • Off: Calculates recommendations but does not apply them. Good for "dry run" analysis.
  • Initial: Assigns resources only at pod creation time.
  • Auto: Updates running pods by restarting them if recommendations differ significantly from requests.
  • InPlaceOrRecreate: Attempts to update Pod resources without recreating the Pod. If in-place update is not possible, it reverts to Auto mode (requires GKE 1.34+).

Example: See assets/vpa-example.yaml for a configuration template.

4. Cluster Autoscaler

While not a workload-level scaler, the Cluster Autoscaler is essential for ensuring your cluster has enough nodes to run the scaled pods.

Enable on a Node Pool:

gcloud container clusters update <cluster-name> \
    --enable-autoscaling \
    --node-pool <node-pool-name> \
    --min-nodes <min> \
    --max-nodes <max> \
    --zone <zone>

Best Practices

  1. Define Resource Requests: HPA and VPA rely on accurate resource requests. Always define them in your container specs.
  2. Avoid Metric Conflicts: Do not configure HPA and VPA to use the same metric (e.g., both CPU). This causes thrashing.
    • Typical Pattern: HPA on CPU, VPA on Memory.
  3. Pod Disruption Budgets (PDBs): Define PDBs to ensure application availability during scaling events or node upgrades.
  4. HPA Lag: HPA has a stabilization window (default 5 mins) to prevent rapid fluctuation.
  5. VPA "Auto" Mode Risks: In "Auto" mode, VPA restarts pods to change resources. Ensure your application handles restarts gracefully (e.g., handles SIGTERM).
    • Note: By default, VPA requires at least 2 replicas to perform evictions. In GKE 1.22+, you can override this by setting minReplicas in PodUpdatePolicy.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

78/100Analyzed 2/20/2026

Well-structured skill covering GKE workload scaling (manual, HPA, VPA, Cluster Autoscaler) with practical commands, prerequisites, and best practices. Scores high on clarity and safety. Minor deductions for referencing external template files that may not exist, and for being somewhat GKE-specific rather than generic Kubernetes. Contains good actionability with clear commands and structured workflows.

85
85
70
72
82

Metadata

Licenseunknown
Version-
Updated2/6/2026
PublisherGoogleCloudPlatform

Tags

observability