home ->

GKE Upgrade Runbook Notes

By Hoang-Long Nguyen · March 29, 2026 · Kubernetes, GKE, Runbooks

The upgrade checklist I want nearby before moving Kubernetes node pools through production versions.


Kubernetes upgrades are rarely difficult because of the version number alone. They become difficult when ownership, drain behavior, disruption budgets, and rollback paths are unclear.

Before the window

Confirm deprecated APIs, node image changes, autoscaler behavior, ingress controller compatibility, and any workloads with strict disruption budgets. The goal is to know which failure modes are expected before the control plane starts moving.

During the window

Upgrade a narrow node pool first and watch scheduling pressure, pod churn, error rate, and request latency. A green cluster version does not mean a safe application state. The application signals matter more than the upgrade progress bar.

After the window

Record what surprised the team. Upgrade runbooks get better when they capture the small operational details: which workloads drained slowly, which alerts were noisy, and which checks were missing.