Create Infrastructure "host not reporting" condition

Use New Relic Infrastructure's Host not reporting alert condition to notify you when we have stopped receiving data from an Infrastructure agent. This Infrastructure feature allows you to dynamically alert on groups of hosts, configure the time window from five to 60 minutes, and take full advantage of New Relic Alerts.

Anyone can view alerts tied to your account. Only Owner, Admins, or Add-on Managers can create, modify, or delete conditions.

Features

You can define conditions based on the sets of hosts most important to you, and configure thresholds appropriate for each filter set. The Host not reporting event triggers when data from the Infrastructure agent does not reach our collector within the time frame you specify.

This feature's flexibility allows you to easily customize what to monitor and when to notify selected individuals or teams. In addition, the email notification includes links to help you quickly troubleshoot the situation.

Host not reporting condition Features
What to monitor

You can use filter sets to select which hosts you want to be monitored with the alert condition. The alert condition will also automatically apply to any hosts you add in the future that match these filters.

How to notify

Alert conditions are contained in alert policies. You can select an existing policy or create a new policy with email notifications from the Infrastructure UI. If you want to create a new policy with other types of notification channels, use the Alerts UI.

When to notify

Email addresses (identified in the alert policy) will be notified automatically about threshold violations for any host matching the filters you have applied, depending on the policy's incident preferences.

Where to troubleshoot

The link at the top of the email notification will take you to the Infrastructure Events page centered on the time when the host disconnected. Additional links in the email will take you to additional details in Alerts.

Create "host not reporting" condition

To define the Host not reporting alert criteria:

  1. Follow standard procedures to create an Infrastructure alert condition.
  2. Select Host not reporting as the Alert type.
  3. Define the Critical threshold for triggering the alert notification: minimum 5 minutes, maximum 60 minutes.
  4. Enable 'Don't trigger alerts for hosts that perform a clean shutdown' option, if you want to prevent false alerts when you have hosts set to shut down via command line.
    Currently this feature is supported on all Windows systems and Linux systems using systemd.

Depending on the alert policy's incident preferences, the policy defines which notification channels we use when the defined Critical threshold for the alert condition passes. To avoid "false positives," the host must stop reporting for the entire time period before a violation is opened.

Example: You create a condition to open a violation when any of the filtered set of hosts stop reporting data for seven minutes.

  • If any host stops reporting for five minutes, then resumes reporting, the condition does not open a violation.
  • If any host stops reporting for seven minutes, even if the others are fine, the condition does open a violation.

Investigate the problem

To further investigate why a host is not reporting data:

  1. Review the details in the alert email notification.
  2. Use the link from the email notification to monitor ongoing changes in your environment from Infrastructure's Events page. For example, use the Events page to help determine if a host disconnected right after a root user made a configuration change to the host.
  3. Optional: Use the email notification's Acknowledge link to verify you are aware of and taking ownership of the alerting incident.
  4. Use the email links to examine additional details in the Incident details page in Alerts.

Intentional outages

We can distinguish between unexpected situations and planned situations with the option 'Don't trigger alerts for hosts that perform a clean shutdown'. Use this option for situations such as:

  • Host has been taken offline intentionally.
  • Host has planned downtime for maintenance.
  • Host has been shut down or decommissioned.

We do not recommend using Host not reporting alert conditions for autoscaling hosts.

For more help

Recommendations for learning more: