Hand over platform on-call duties
Use this procedure when a platform admin rotates off primary on-call — planned rotation change, departure (after Revoke platform admin access), or mid-shift emergency handover. The goal is that no open incident, pinned ticket, or ownership-of-mitigation is stranded when the outgoing admin’s access ends.
Procedure
- Freeze new PagerDuty assignments. In PagerDuty, set the schedule override so the incoming admin is the primary for the full handover window (minimum 2 hours overlap for planned rotations; immediate for emergencies). Do not remove the outgoing admin’s schedule yet — overlap prevents coverage gaps.
-
Walk through open incidents. Open
https://novalien.pagerduty.com/incidents?status=triggered,acknowledgedfiltered to the platform-ops service. For each incident:- Outgoing admin reassigns the incident to the incoming admin.
- Outgoing admin posts a one-paragraph status: current hypothesis, last action taken, what the next action is.
- Incoming admin acknowledges the reassignment explicitly.
No silent handover. A reassignment without a status paragraph is not a handover.
-
Hand over pinned support tickets. Open the support tracker view
assignee=<outgoing> AND pinned=true. For each:- If the ticket is waiting on a customer reply, reassign and add a
waiting_on_customerlabel. - If the ticket is waiting on engineering, reassign and link the engineering issue.
- If the ticket is active and has an open impersonation session, end the impersonation first (US-PLAT-10). The incoming admin mints a fresh token when they pick up the ticket — impersonation tokens are per-human, never transferred.
- If the ticket is waiting on a customer reply, reassign and add a
-
Transfer feature-flag rollouts in flight. Open the feature-flag console and filter to rollouts in
rampingorpausedstate. The outgoing admin annotates each with the current ramp percentage, the last observed error rate, and the next planned ramp step. Incoming admin takes ownership of the annotation.
-
Post the handover summary. In
#novavms-platform-ops: “On-call: @outgoing → @incoming as of. N incidents reassigned, M tickets reassigned, K rollouts in flight. Details in the thread.” Incoming admin thumbs-up to acknowledge. -
Audit check. If the handover is part of an offboarding, verify
platform_audit_logshowsplatform.impersonation_endedfor every session the outgoing admin closed in step 3. Any left-open impersonation at the moment of disable is caught by the disable itself but is noisier to reason about later — better to close cleanly.
Common variations
- Emergency (outgoing admin incapacitated or compromised): skip the status-paragraph requirement; the incoming admin reads PagerDuty history, ticket comments, and the last hour of
#novavms-platform-opsto reconstruct context. Flag any unclear thread in the channel for a team-wide catch-up. - Mid-shift outage: both admins are on simultaneously. Do the formal handover after the outage is mitigated; in the moment, the senior admin leads and the other shadows.
- Routine weekly rotation: the 2-hour overlap is standard and is covered by PagerDuty auto-rotation. The status-paragraph per open incident is still required — it is the one step that does not automate.
Related
- Revoke platform admin access — run this handover first when a departure is driving the revocation.
- Incident response runbook — what “open incident” means in step 2.
- Scoped impersonation — why tokens do not transfer between humans.