← Back to Operations
Incidents

Incident Response Playbooks That Actually Work

Incidents are stressful. Good playbooks reduce chaos and improve outcomes. Here's how we built runbooks teams actually use during outages.

The Playbook Problem

Most runbooks are walls of text that engineers ignore during incidents. They're outdated, generic, or buried in wikis. During a production outage, cognitive load is high—complex docs don't help.

Playbook Principles

**1. Start with symptoms, not systems.** Engineers see "API errors spiking"—not "database connection pool exhausted." Map user-facing symptoms to likely causes.

**2. Use checklists, not paragraphs.** Bullet points with clear actions: "Check X, run Y command, escalate if Z."

**3. Include exact commands.** Don't say "check logs"—provide: kubectl logs -n prod deployment/api-server --tail=100

**4. Test regularly.** Run game days to practice playbooks. Update based on what actually worked.

Structure

Symptom Detection: How this issue presents (alerts, metrics, user reports)

Immediate Actions: Stop the bleeding (scale up, failover, disable feature)

Investigation: Commands to gather data (logs, traces, metrics queries)

Resolution: Step-by-step fix with verification

Communication: Templates for status page updates and stakeholder notifications

Example: API Latency Spike

1. Check Grafana dashboard for affected endpoints 2. Run: kubectl top pods -n prod - look for CPU/memory saturation 3. If DB connections maxed: Scale API replicas: kubectl scale deployment api-server --replicas=10 4. Check recent deployments: kubectl rollout history deployment/api-server 5. If recent deploy: Rollback: kubectl rollout undo deployment/api-server 6. Update status page: "Investigating elevated API latency"

Maintenance

Review playbooks quarterly. After each incident, update relevant playbook with lessons learned. Archive playbooks for deprecated systems.

Result: Mean time to resolution dropped 40% after implementing structured playbooks. New engineers onboard faster with clear procedures.