Use New Relic Infrastructure's Host not reporting alert condition to notify you when New Relic has stopped receiving data from an Infrastructure agent. This Infrastructure feature allows you to dynamically alert on groups of hosts, configure the time window from five to 60 minutes, and take full advantage of New Relic Alerts.
Anyone can view alerts tied to your New Relic Infrastructure account. Only Owner, Admins, or Add-on Managers can create, modify, or delete conditions.
New Relic Infrastructure allows you to define conditions based on the sets of hosts most important to you, and configure custom alerting thresholds appropriate for each filter set. The Host not reporting event triggers when data from the Infrastructure agent does not reach the New Relic collector within the time frame you specify.
This feature's flexibility allows you to easily customize what to monitor and when to notify selected individuals or teams. In addition, the email notification includes links to help you quickly troubleshoot the situation.
|Host not reporting condition||Features|
|What to monitor||
You can use filter sets to select which hosts you want to be monitored with the alert condition. The alert condition will also automatically apply to any hosts you add in the future that match these filters.
|How to notify||
Alert conditions apply to alert policies. You can select an existing policy or create a new policy with email notifications from the New Relic Infrastructure UI. If you want to create a new policy with other types of notification channels, use New Relic Alerts.
|When to notify||
Email addresses (identified in the Infrastructure alert policy) will be notified automatically about threshold violations for any host matching the filters you have applied.
|Where to troubleshoot||
The link at the top of the email notification will take you to the Infrastructure Events page centered on the time when the host disconnected. Additional links in the email will take you to additional details in New Relic Alerts.
Create "host not reporting" condition
To define the Host not reporting alert criteria:
- Follow standard procedures to create a New Relic Infrastructure alert condition.
- Select Host not reporting as the Alert type.
- Define the Critical threshold for triggering the alert notification: minimum 5 minutes, maximum 60 minutes.
Your alert policy defines which personnel or teams and which notification channels New Relic uses when the defined Critical threshold for the alert condition passes. To avoid "false positives," the host must stop reporting for the entire time period before New Relic triggers the alert.
Example: You create a condition to trigger when any of the filtered set of hosts stop reporting data for seven minutes.
- If any host stops reporting for five minutes, then resumes reporting, New Relic does not trigger the alert.
- If any host stops reporting for seven minutes, even if the others are fine, New Relic does trigger the alert.
Investigate the problem
To further investigate why a host is not reporting data:
- Review the details in the alert email notification.
- Use the link from the email notification to monitor ongoing changes in your environment from New Relic Infrastructure's Events page. For example, use the Events page to help determine if a host disconnected right after a root user made a configuration change to the host.
- Optional: Use the email notification's Acknowledge link to verify you are aware of and taking ownership of the alerting incident.
- Use the email links to examine additional details in the Incident details page in New Relic Alerts.
New Relic does not distinguish between unexpected situations and planned situations when a host is not reporting data, such as:
- Host has been taken offline intentionally.
- Host has planned downtime for maintenance.
- Host has been shut down or decommissioned.
New Relic applies your Host not reporting condition even in situations with intentional outages. Be sure to investigate whether the event was planned or unexpected. In addition, you can use New Relic Alert's Incident details page to manually close the incident.