Incident Review Without Theater
A useful incident review turns a messy production event into fewer surprises next time.
Personal notes on site reliability engineering, infrastructure, and building things that last.
about ->A useful incident review turns a messy production event into fewer surprises next time.
Cost control gets easier when billing signals show up where engineers already work.
Dashboards and alerts should reduce decision time, not decorate a wall of screens.
Notes on introducing GitOps gradually without turning every deployment into a process migration.
The upgrade checklist I want nearby before moving Kubernetes node pools through production versions.
A practical way to choose service level objectives when the platform team is small, busy, and still responsible for production confidence.
A practical look at how this site is built — Astro Content Collections, static output, Cloudflare Pages deployment, and the few decisions that made it worth the setup.
An introduction to LONG R&D — a personal space for writing about site reliability engineering, infrastructure, and building systems that last.