When does an AI agent use this pattern?

When the trigger described above matches the situation the agent is handling. The agent queries FusedFrames at runtime, this pattern is returned and the agent follows the recorded reasoning and steps to reach the recorded outcome.

How were the steps captured?

By the desktop app on the machine of someone who actually does this work. It records clicks, keystrokes, copy and paste, drag and drop, selected text and screen regions across Datadog, then groups them into the meaningful steps shown above.

Where does the reasoning come from?

From the operator. When the desktop app sees a decision the action alone cannot explain, it asks why in the moment. The operator's answer is recorded and becomes the reasoning attached to the pattern, so the AI agent gets the thinking and not just the steps.

What if my team handles this differently?

Capture how your team actually does it. The desktop app records what your operator does on your tools and the web app distills that into a pattern shaped by your team. Each team's library reflects their own approach rather than the example shown here.

What happens when a case looks like this pattern but does not quite fit?

The connections shown on this page link to related patterns that may match the case better. The agent can follow those edges to find a closer fit.

How does my AI agent actually call this pattern?

Through the read-only REST API or the npm CLI. Create an API key in the dashboard, choose whether it accesses all libraries or specific ones and have your agent query at runtime. The response is structured JSON ready for tool use.

Is the data shown on this page from a live customer?

No. Example libraries on this site are illustrative. Patterns in your own account are built from your team's captured actions and stay private to your workspace.

How do I build a pattern like this for my team?

Install the desktop app on the machine of the person who already does this work, ask them to handle a real case end to end, then distill the captured action into a pattern in the web app. The pattern you end up with mirrors how your team actually operates.

Verifying a rollback restored service in Datadog

After the rollback or forward fix has deployed, the engineer waits five minutes and then watches the same Datadog monitor for baseline error rate, latency and SLO burn. They also check the service map for downstream consumers still affected by queued errors.

Tags

datadogverificationslo-burn-rateservice-maprollback

Lets your agent filter to the right pattern

Categories and tags scope the retrieval set, so your agent receives only the patterns relevant to the task instead of scanning the full library.

What and why

The observed behaviour and the reasoning behind it.

Behaviour

After the rollback or forward fix has deployed, the engineer returns to the same Datadog monitor and APM service page used during triage, sets the time range to last 30 minutes and waits at least 5 minutes after the deploy completed before checking that error rate, latency and the SLO burn rate have returned to baseline. They also check the affected service's downstream consumers in the service map view to make sure queued errors have not propagated.

Reasoning

The error rate on the affected service usually recovers within 1 to 2 minutes of the rollback finishing, but the SLO burn rate widget can stay elevated for longer because it integrates over a window. Declaring success on the metric graph alone has caused the team to close incidents that were still burning budget. Downstream services may also still be returning errors due to retried requests stuck in queues even after the upstream is healthy. Looking at the service map catches the case where the rollback fixed the source but consumers need a separate intervention.

Gives your agent an operator's reasoning

Behaviour records the action; reasoning records the rationale. Your agent applies both to handle cases that diverge from the captured examples.

Cause and effect

What initiates this pattern and what it produces.

Trigger

A revert PR or a forward fix has been deployed and the deploy pipeline has reported success on the new release tag.

Outcome

The engineer has confirmed that the affected service's error rate, latency and SLO burn rate have returned to baseline, and that no downstream consumers are still affected by queued errors.

Tells your agent when to act and what to verify

Your agent stays within the scope set by the trigger and outcome, so it doesn't extend beyond the work or finish before the outcome is met.

Standard operating procedure

Step-by-step instructions to reproduce this pattern.

Datadog

Reopen the original monitor detail page from the PagerDuty incident link.

Reusing the PagerDuty link rather than navigating fresh keeps the time range and scope identical to the triage step, which is what makes a true before and after comparison possible. A different scope or filter set produces an apples to oranges comparison that has tripped the team up before.

Expected: The Datadog monitor detail page opens with the same scope and a freshly extended time range.

Datadog

Set the time range to 'Last 30m'.

30 minutes captures the spike, the rollback deploy and the recovery in a single view. Shorter ranges hide the spike and longer ranges flatten the recovery to the point that small lingering issues become invisible.

Expected: The metric graph shows the spike in the left third, the deploy marker in the middle and the recovery in the right third.

Datadog

Wait at least 5 minutes after the deploy completion time before declaring recovery.

The metric pipeline has roughly a 1 to 2 minute aggregation lag, the rolling deploy itself takes 1 to 3 minutes to reach all instances and CDN cache TTL adds further delay. Declaring success at minute 2 is regularly wrong. The waiting period feels long during an incident but is the cheapest insurance against a re-page.

Expected: At least 5 minutes have elapsed between the deploy success and your verification check.

Datadog

Confirm the error rate is below the monitor's alert threshold and trending down or flat.

Below threshold is the necessary condition. Trending down or flat is the sufficient one. An error rate that is below threshold but rising is a partial recovery and likely to re-breach. Treat that case as not yet recovered.

Expected: The error rate is below the threshold line and the slope is non-positive over the last 5 minutes.

Datadog

Open the linked SLO from the service overview and check the burn rate widget.

The instantaneous burn rate is what matters for closing the incident. If it is still above 1 the service is still consuming error budget faster than the SLO target allows, even if the raw error rate looks fine. Wait for the burn rate to drop below 1 before resolving the page.

Expected: The SLO burn rate widget shows a current value below 1 and trending toward baseline.

Datadog

Open the service map view and inspect the immediate downstream consumers.

Click the affected service in the map and follow the outbound edges. Each downstream consumer should show its own error rate and latency. If a downstream is still red, the rollback fixed the source but consumers are still draining queued errors and may need a separate restart or queue purge. Note any affected downstream so you can decide whether to declare partial recovery or a full one.

Expected: All immediate downstream services show error rates back at baseline, or any still affected are noted for follow up.

Hands your agent your team's exact steps

Your agent executes the documented procedure in order, against the applications named at each step, producing consistent output across human and agent runs.

Book a demo

Related patterns

How this pattern connects to other patterns in the library.

Often previous

Rolling back by reverting the merge commit Shipping a forward fix instead of rolling back

Often next

Resolving an incident and recording the timeline

Lets your agent confirm the match and its scope

Alternative patterns let your agent compare candidates and select the closest match. Previous and next patterns define position in the workflow, so the agent operates within the scope of this one.

Supporting actions

Actions that provide evidence for this pattern.

Verified api-gateway recovery, error rate flat, SLO burn 0.4

Checked checkout-service downstream after rollback, all green

Waited 6 min post-deploy before declaring billing-worker recovered

Spotted lingering errors on payments-service, downstream of api-gateway

SLO burn still 1.2 on checkout-service, held off resolving for 4 more min

Shows your agent the evidence behind the pattern

Each pattern is derived from captured action sequences. Your agent can reference the source actions to verify any field, so nothing it acts on is authored by hand or hallucinated.

Metadata

Timestamps and identifiers.

Evidence	Observed 27 times across 5 connections
Applications	Datadog
First seen	2 Feb 2026, 14:01
Last seen	5 May 2026, 18:32

Helps your agent rank patterns by confidence

Action count and last-seen timestamp weight each pattern by frequency and recency, so your agent selects the highest-scoring match rather than the first returned.

Questions

Frequently asked questions

Book a demo View all examples

Speak to the founder

Get a demo. Watch a live capture, then an AI agent query the result.

Ask anything. Pricing, security or integrating with your stack.

No purchase obligation

Book a call

Start capturing

Record in minutes. Install once and work as normal.

Plug AI agents in. One API call from any AI agent stack.

Refund on unused credits if you cancel