bRRAIn Exchange

← Back to browse · standard · v1.0.0

Operations Runbook

Step-by-step runbook for routine operational procedures with rollback and escalation paths.

# Runbook — [Procedure name] **Procedure ID:** RB-[NNN] **Owner:** [Team] **Last reviewed:** [Date] **Next review due:** [Date] **Audience:** [On-call, deployer, customer-success agent, etc.] ## When to use this runbook A clear trigger condition. If the trigger is "the alert that names this runbook fires", say so. If the trigger requires human judgement ("response time looks elevated"), spell out the thresholds. ## Pre-checks (do not skip) - [ ] Confirm I have [tool access / VPN / paging permissions] - [ ] Confirm the incident isn't already being handled — check [channel] - [ ] Confirm change-freeze window doesn't apply (or escalate to break it) ## Steps Steps are imperative, single-action sentences. No "or" branches in a single step — split into separate steps. 1. Open [tool] and navigate to [view]. 2. Run the diagnostic: `[command]`. Expected output: [pattern]. 3. If output matches "case A", proceed to "Case A" section. If "case B", proceed to "Case B" section. Otherwise, escalate per "Escalation". ### Case A: [Symptom description] 1. [action] 2. [action] 3. Verify the fix by [check]. Expected result: [observable signal]. ### Case B: [Symptom description] 1. [action] 2. [action] 3. Verify the fix by [check]. ## Verification After applying any case above, confirm: - [ ] [Health-check 1] - [ ] [Health-check 2] - [ ] [Customer-facing endpoint returns 200] ## Rollback If the steps above made things worse: 1. [first reversible action] 2. [next reversible action] 3. [final restore-from-backup step] ## Escalation - Primary on-call: [pager schedule link] - Service owner: [individual] - Manager: [individual] - After 30 minutes with no resolution: page [next level] ## References - Architecture diagram: [link] - Related runbooks: [links] - Dashboards: [links] - Postmortems of past occurrences: [links]

Price
Free
25% platform fee
State
approved