Notes on building reliable systems
2026-01-20
A practical checklist: observability, deploy safety, and operational calm.
Here’s a lightweight checklist I like to use:
- Measure what matters (SLOs)
- Make deployments boring (small changes, safe rollback)
- Treat incidents as learning
Observability
- Logs
- Metrics
- Traces