GKE Upgrade Runbook Notes
The upgrade checklist I want nearby before moving Kubernetes node pools through production versions.
Kubernetes upgrades are rarely difficult because of the version number alone. They become difficult when ownership, drain behavior, disruption budgets, and rollback paths are unclear.
Before the window
Confirm deprecated APIs, node image changes, autoscaler behavior, ingress controller compatibility, and any workloads with strict disruption budgets. The goal is to know which failure modes are expected before the control plane starts moving.
During the window
Upgrade a narrow node pool first and watch scheduling pressure, pod churn, error rate, and request latency. A green cluster version does not mean a safe application state. The application signals matter more than the upgrade progress bar.
After the window
Record what surprised the team. Upgrade runbooks get better when they capture the small operational details: which workloads drained slowly, which alerts were noisy, and which checks were missing.