Notes on building reliable systems

2026-01-20

A practical checklist: observability, deploy safety, and operational calm.

Here’s a lightweight checklist I like to use:

  1. Measure what matters (SLOs)
  2. Make deployments boring (small changes, safe rollback)
  3. Treat incidents as learning

Observability

  • Logs
  • Metrics
  • Traces