SRE Agent overview

preview

We're still working on this feature, but we'd love for you to try it out!

This feature is currently provided as part of a preview program pursuant to our pre-release policies.

Welcome to the New Relic SRE Agent public preview! We're excited to introduce you to a new way of managing incidents. The SRE Agent is designed to assist in incident management workflows by helping to automate certain data correlation and diagnosis.

Prudence

Compliance restrictions: Due to security and compliance requirements, you must not use the New Relic AI MCP server if your accounts or data fall under FedRAMP or HIPAA compliance mandates. For public preview, this feature is outside the scope for FedRAMP or HIPAA-regulated accounts.

Retention periods for history/MELT data in the agent may differ from your general account retention settings.

The New Relic SRE Agent includes access to Generative AI resources and is subject to the Generative AI Service Specific Terms.

What is the New Relic SRE Agent?

The New Relic SRE Agent is an intelligent agent designed to be your on-call teammate. By interacting directly with your telemetry data and established workflows, it moves beyond simple alerting to provide assistance with active investigation, diagnosis, and remediation.

Key capabilities

Here are some examples of how the SRE Agent can help best support your operations:

Alert triage and validation: The agent is designed to act as a preliminary filter for incoming notifications. It can help evaluate the context of an alert against historical baselines and current system health to assist in determining if an event is a high-priority anomaly or an expected fluctuation.
Real-time entity health: The agent can provide a quick, high-level pulse check on monitored parts of your stack. By attempting to correlate available golden signal metrics with underlying infrastructure performance, it aims to generate a narrative summary of system health.
Change intelligence: It can help identify potential performance regressions, new error patterns, or latency spikes introduced by recent updates to assist in checking if your system is healthy after deployments.
Deep incident analysis: When system degradation occurs, the agent is designed to assist with automated root-cause discovery. It can traverse distributed traces, analyze log patterns, and inspect resource metrics to help identify potential technical failures or downstream dependencies that may be contributing to an outage.
Automated remediation: Beyond diagnosis, the SRE Agent aims to assist with recovery efforts. Once a potential root cause is identified, the agent may suggest potential technical interventions-such as rolling back a deployment or adjusting resource allocations—which should be reviewed by an engineer before implementation.

How the SRE Agent analyzes your environment

The SRE Agent is designed to act as an intelligent layer between your telemetry data and potential insights. When you ask the agent a question, it performs an analysis by leveraging a specialized suite of tools.

The agent’s analysis process follows three primary pillars:

Comprehensive context gathering: The agent is designed to gather relevant details, such as mapping available architecture. It can help identify relevant entities, explore their relationships and dependencies, and highlight recent change events or deployments that may have triggered a shift in performance.
Deep telemetry investigation: Once the context is established, the agent dives into the problem. It can help evaluate:
- Metrics and trends: Analyzing throughput, latency, and error rates attempting to forecast potential future system behavior.
- Logs and errors: Sifting through distributed traces, error groups, and log patterns to help isolate potential sources and causes of an anomaly.
- Health and alerts: Reviewing active incidents, alert policies, and security vulnerabilities to understand the potential blast radius.
Automated root cause and impact analysis: The agent is designed to assist with causal analysis. It attempts to correlate performance dips with specific transaction patterns, infrastructure bottlenecks, and end-user impact. It can also help convert natural language into NRQL queries to retrieve specific data points.

Billing

The SRE Agent is a partially paid feature, even during this preview period. We are not charging for token usage, however, auxiliary compute costs from downstream services may apply. Usage is billed under the Advanced and Core Compute product according to your account's pricing model. When the feature becomes generally available, billing will continue based on the terms in your New Relic agreement.

Next steps

Set up SRE Agent

Configure prerequisites, permissions, and alert integration to get your SRE Agent running.

Use SRE Agent

Learn usage tips and best practices for effective incident analysis and investigation.