Skip to content

Triaging a page from the linked Datadog monitor

After acknowledging a page, the engineer follows the linked Datadog monitor and starts triage on its detail page. They expand the time range, read the actual error messages and toggle deployment markers before deciding whether the issue is a code change, an upstream failure or a traffic event.

Category
Tags
datadogtriagelogs-sampledeployment-markersinvestigation
What and why
The observed behaviour and the reasoning behind it.
Behaviour
Reasoning
Cause and effect
What initiates this pattern and what it produces.
Trigger
Outcome
Standard operating procedure
Step-by-step instructions to reproduce this pattern.
1

Datadog

Land on the monitor detail page from the PagerDuty link.

Confirm the monitor name and the alert message match the PagerDuty incident. If the link drops you on the monitors list rather than the detail page, the PagerDuty link has expired or been edited and you should re-open the incident detail page.

Expected: The Datadog monitor detail page is open with the alert state at the top and the metric graph scoped to a 15 minute window centred on the breach.

2

Datadog

Click the time range selector and switch from 15m to 1h.

The 15 minute view is too narrow to see the baseline. Looking at the hour before the spike reveals slow climbs, deploy adjacent dips and recovery patterns from earlier flaps. If the spike is genuinely sudden in the 1h view, that on its own narrows the cause to a discrete event such as a deploy or a config change.

Expected: The metric graph re-renders with the breach roughly in the middle, the baseline visible to the left and any recovery to the right.

3

Datadog

Click the 'Logs Sample' tab below the metric graph.

The Logs Sample is filtered to the same scope as the monitor (service, env, tags) and the same time window. Read the top three log lines, not just the most recent. Repeating error messages with the same trace pattern point to a deterministic bug, varied messages with different traces point to an upstream or infrastructure issue.

Expected: A list of 10 to 50 log lines from the failing scope is displayed, each one linked to its trace and host.

4

Datadog

Toggle the Deployment Markers overlay using the small flag icon above the metric graph.

Deployment markers come from the deployment_id and version tags emitted by the CI pipeline, so they line up with GitHub releases. If a marker sits within 5 minutes of the spike start the deploy is the prime suspect, but allow a 5 minute buffer because metrics aggregate in 1 minute buckets and CDN cache TTL adds further delay.

Expected: Vertical markers appear on the metric graph at deployment times, each tagged with the version that shipped.

5

Datadog

Open the affected service's APM page in a new tab using the 'View Service' link in the monitor footer.

Use the View Service link rather than searching from the APM index. The link preserves the time range and environment filters so the latency, error rate and throughput panels are pre-scoped to the incident.

Expected: The APM service page opens in a new tab showing the four golden signals over the same 1 hour window.

6

Datadog

Note the failing service name, the dominant error class and the spike start time in a scratch note for the next step.

These three facts are what the GitHub correlation step needs. Capturing them now prevents bouncing back to Datadog while reading PR diffs.

Expected: You have the service name, the error class and a UTC timestamp of the spike start ready to use.

Supporting actions
Actions that provide evidence for this pattern.
Triaged PD-PT4ZXKR: api-gateway 5xx spike, logs sample showed retry storm
Datadog: expanded api-gateway monitor to 1h, deploy marker visible
Read logs sample on checkout-service monitor for PT5MN8L
Triaged billing-worker timeout monitor, deploy marker at 14:32
Toggled deploy markers on payments-service monitor, no marker near spike
Metadata
Timestamps and identifiers.
EvidenceObserved 58 times across 5 connections
ApplicationsDatadog
First seen23 Jan 2026, 11:02
Last seen6 May 2026, 22:48
Questions

Frequently asked questions

Speak to the founder

Henry Denton, founder of FusedFrames

Get a demo. Watch a live capture, then an AI agent query the result.

Ask anything. Pricing, security or integrating with your stack.

No purchase obligation

Start capturing

Record in minutes. Install once and work as normal.

Plug AI agents in. One API call from any AI agent stack.

Refund on unused credits if you cancel