Skip to content

Resolving an incident and recording the timeline

Once metrics have been healthy for at least 15 minutes, the engineer resolves the PagerDuty incident with a complete resolution note. That note names the cause, attaches the GitHub PR numbers and Datadog dashboard URL and tags the affected service plus a root cause classification.

Category
Tags
pagerdutyresolutionpost-mortemtaggingroot-cause
What and why
The observed behaviour and the reasoning behind it.
Behaviour
Reasoning
Cause and effect
What initiates this pattern and what it produces.
Trigger
Outcome
Standard operating procedure
Step-by-step instructions to reproduce this pattern.
1

Datadog

Confirm 15 minutes have elapsed since the metric returned to baseline.

Use the same monitor view used for verification. If the 15 minutes contain any threshold breaches, even brief ones, restart the timer. Flapping incidents are the most common reason the page re-fires within the hour, and 15 minutes is the empirical floor below which the team has been bitten before.

Expected: The metric graph shows a continuous 15 minute window below the alert threshold.

2

PagerDuty

Open the incident and click 'Add Status Update' one final time.

Write the cause and the remediation in two short sentences. Example: 'Cause: api-gateway #4892 changed the retry policy default and increased upstream pressure. Remediation: reverted via #4901, deploy completed at 14:48 UTC.'

Expected: A status update appears on the timeline with the cause and the remediation.

3

PagerDuty

Click the Tags field and apply the affected service tag and a root cause classification tag.

Use the team's standard tag set. Service tags follow the repository name (api-gateway, checkout-service, billing-worker). Root cause classifications are bad-deploy, infra, dependency, traffic, configuration. Stick to the standard set even if it is a slightly imperfect fit. The quarterly report depends on consistency.

Expected: The incident shows the service tag and the root cause classification tag in the sidebar.

4

PagerDuty

Click Resolve and complete the resolution note with PR links and the Datadog dashboard URL.

Paste the suspect PR number, the revert or hotfix PR number and the Datadog monitor URL into the resolution note. The note has a 1500 character limit on desktop and around 500 on mobile, which is why this step is done from a workstation. Anyone opening this incident in the post-mortem will look for these three links first.

Expected: The incident is in Resolved status with a resolution note containing the two PR links and the dashboard URL.

5

PagerDuty

Confirm the resolution event appears at the bottom of the incident timeline.

If the resolution event is missing, the click did not register. This sometimes happens when the page was reassigned earlier and the responder did not re-acquire it. Re-open and resolve from the original responder account.

Expected: The timeline ends with a Resolved event named after the responder.

Related patterns
How this pattern connects to other patterns in the library.
Supporting actions
Actions that provide evidence for this pattern.
Resolved PD-PT4ZXKR with #4892, #4901 and api-gateway dashboard
Tagged PD-PT5MN8L with checkout-service and bad-deploy
Waited 17 min after metric recovery before resolving billing-worker page
Resolution note for PD-Q9HK22N: cause, remediation, 3 links
Re-resolved PD-PT5MN8L after first click did not register on reassigned page
Metadata
Timestamps and identifiers.
EvidenceObserved 33 times across 5 connections
ApplicationsPagerDuty, GitHub, Datadog
First seen2 Feb 2026, 14:21
Last seen5 May 2026, 18:54
Questions

Frequently asked questions

Speak to the founder

Henry Denton, founder of FusedFrames

Get a demo. Watch a live capture, then an AI agent query the result.

Ask anything. Pricing, security or integrating with your stack.

No purchase obligation

Start capturing

Record in minutes. Install once and work as normal.

Plug AI agents in. One API call from any AI agent stack.

Refund on unused credits if you cancel