This applies only to the legacy alerting system, not New Relic Alerts.
New Relic's legacy alerting system includes a basic "availability monitoring" feature. It simply verifies application availability by making regular requests to apps and recording errors. When it receives an error accessing your web app, the legacy alerting system sends a notification, sometimes referred to as a "downtime alert."
This functionality has been improved upon and superseded by the enhanced functionality and alerting capabilities available with New Relic Synthetics. Synthetics provides in-depth scriptable testing, including real browser tests and testing of API endpoints. Synthetics also includes free ping monitoring, which allows you to monitor your website from geographical locations around the world. This is why New Relic Alerts does not refer to an "availability monitoring" feature.
Enable downtime alerts
To set up downtime alerting with the legacy alerting system, use the Application alert policies page for the legacy alerting system:
- From rpm.newrelic.com, select APM > Alerts > Application policies > (selected app policy).
- To enable alerting, select Downtime alerts > ON.
- Select the number of minutes to wait before sending an alert (maximum ten minutes), or use the default value.
- Select Save your changes.
If you enable downtime alerts in your application's alert policies, New Relic will monitor your applications for ongoing availability by using an external pinger service. New Relic supports availability monitoring and downtime alerts at no extra charge on all product levels. However, some reports that include availability data (for example, the APM SLA report) may not be available for all products.
If you are getting false alarms, try increasing the time to wait before sending an alert.
Set the application URL
Downtime alerting supports only one URL. If you need to monitor many URLs, use New Relic Synthetics.
When enabling downtime alerting for an application, specify a URL for the availability monitor system to ping.
- From rpm.newrelic.com, select APM > Applications > (selected app) > Settings > Availability monitoring.
- Type the target URL you want to monitor.
- Optional: Specify a required response substring for the application, and select the checkbox options for how to treat redirects and the SSL certificate.
- Select Save your changes.
User-Agent header sent with the ping request contains the value
NewRelicPinger/1.0 (your_account_id). You can specify webpage targets with HTTP and HTTPS URLs. The pinger uses HEAD requests. If a request fails, or if you are using request substrings, the pinger will use GET requests instead. The URLs may include query strings.
New Relic has several pinger servers distributed around the globe. If a downtime event does occur, you can see the region where it was reported from and the number of failed checks from each region. If your firewall restricts access to your monitoring URL, whitelist the pinger URLs.
New Relic checks your site approximately every 20 seconds. When we detect a failure, New Relic increases the rate to once every 10 seconds until the site recovers. This gives you much more fine-grained information about when your site recovered, as well as more accurate estimates for a rate of failure when there is a partial failure.
Often customers have problems only intermittently. Other services may miss these or mis-categorize them as one-off events. Because of frequent re-checking, New Relic has greater success surfacing situations where you have a fraction of your page requests failing.
Similarly, your users can sometimes experience downtime even if you can access your site. New Relic has pingers in Europe, Asia, and the United States. This means we may catch broken network paths to your site, even when your own network path is working. This often indicates a temporary network glitch in a hosting provider.
A ping is not the same as a Linux "ping" command, which checks to see if the interface to your system is live. New Relic's availability monitoring is a more extensive test; it verifies your web server is functioning correctly by accessing a webpage on your site.
You can also provide substring text the pinger must receive to record a valid result. To set this substring text: From rpm.newrelic.com, select APM > Applications > (selected app) > Settings > Availability monitoring.
New Relic checks approximately the first 57KB of data returned, including the HTTP headers. If the structure of your page varies, the size of the data preceding the substring may exceed 57KB. When this occurs, your substring will not be found, and New Relic will record an error.
Place the substring near the beginning of the page to avoid false errors. Some customers create a simple page solely to check availability.
Also, be aware of these requirements:
- Substring requests use GET and are case sensitive.
- Regular expressions, JSON responses from your webpage, and wildcards are not supported.
- You cannot use the response substring when specifying
Treat redirects as a valid result. Allowing redirects only checks for a
200response from the redirect; it does not acquire any text to compare.
If the pinger does not find the exact substring text, it responds with the error message:
Content not found. The text in parentheses next to the URL indicates the text string the pinger expected to see.
API for maintenance events
You may want to disable pinging for an application during maintenance periods to avoid affecting your Available % and SLA numbers. In this situation, change the pinger URL settings, or, from the legacy alerting system in New Relic APM, select Downtime > OFF.
The availability monitoring settings user interface also describes how to disable and enable pinging with New Relic REST API calls. To use the REST API, send a POST request to the
disable actions, including your account ID, application ID, and the API_key (in the
X-Api-Key header). Here are examples using cURL:
To enable pinging:
curl https://api.newrelic.com/accounts/<acct_id>/applications/<app_id>/ping_targets/enable -X POST -H "X-Api-Key:<api_key>"
To disable pinging:
curl https://api.newrelic.com/accounts/<acct_id>/applications/<app_id>/ping_targets/disable -X POST -H "X-Api-Key:<api_key>"
You can also retroactively tag downtime events as maintenance events: From rpm.newrelic.com, select APM > Applications > (selected app) > Reports > Availability > (selected Downtime Duration link). This will remove the selected downtime event from reports and availability percentage calculations.
Here is a summary of some features that availability monitoring currently does not support.
|Form submission, redirects, multiple URLs||New Relic does not support multiple URLs or form submission, and we do not follow redirects. The purpose of availability monitoring is to verify that your application server is reachable and responding, not to verify that your application itself is working properly. (That falls under the scope of the New Relic agent.) Availability monitoring detects the types of outages that are outside the scope of the application and would not otherwise be visible in New Relic.|
|Authentication||Authentication schemes such as OAuth or Basic Authentication are not currently supported but are under consideration as a future enhancement.|
|Server name indication (SNI)||Some sites use SSL for multiple domains behind a single IP address. New Relic does not support SNI, so Availability Monitoring will not be able to ping HTTPS URLs in such domains. If possible, configure the pinger to hit an HTTP URL.|
|Intranet||Typically you cannot set up availability monitoring for your intranet site unless users can reach it from the outside.|
Troubleshooting downtime alerts
An outage will appear as a red vertical line in your app's charts. Here are some troubleshooting tips when using downtime alerts.
- Intermittent timeouts
New Relic pings your site approximately every 20 seconds and flags intermittent failures. This is why the availability monitoring service tends to be more sensitive than others. If the network link between New Relic pinger servers and your site is poor, you may see periodic timeout alerts on intermittent failures, even if there are no problems indicated by other monitors.
Even if you can access your site, your site might still be inaccessible for some of your users. With pingers in Europe, Asia, and the United States, New Relic may catch broken network paths to your site, even when your own network path is working. This often indicates a temporary network glitch in a hosting provider.
- Evaluating downtime
To check whether a downtime event is due to intermittent failure:
- Check your ISP or service provider's network status to see if there are any problems.
- Refer to the Server throughput chart in the alert detail view to see if you experienced a drop in server throughput or increase in response time. Also look for an increase in application server queue time or server capacity (found in the Reports section available for certain platforms and product levels only).
- Look for an increase in application server queue time or server capacity (found in the Reports section available for certain platforms and product levels only).
- If you have enabled page load timing (sometimes referred to as real user monitoring or RUM), look at the End user throughput chart in the alert view for drops in end user page views during downtime. The page load timing instrumentation and availability monitoring are independent of one another. During a prolonged outage you should expect to see a noticeable drop in page load timing throughput.
- Try a secondary pinger service or a script using cURL to see if you can detect the same issue independently. Be aware that other pinging services will not be hitting your site as frequently.
- Consider increasing the minimum time threshold for generating alerts so that short periods of intermittent failures will not trigger alerts.
- False alarms
You may experience frequent false alarms if your server has a poor network connection to New Relic's pingers. You can increase the minutes of downtime needed before we send an alert to minimize these false alarms.
- Status errors
New Relic's pinger requests differ substantially from what most web browsers typically send. In nearly all cases this will not make a difference.
However, for some customers the request is rejected. This may be why you receive notification that your site is down with a status error (400, 404, etc) but you are still able to open the URL. In some cases this might be from a more restricted Accepts header; in other cases it might be the user agent.
From a command line, try this cURL request to see if you can reproduce the failure:
curl -v \ -H "Cache-Control: no-cache, max-age=0" \ -H "User-Agent: NewRelicPinger/1.0 (1)" \ -H "X-Newrelic-Ignore: true" \ http://www.somehost.com> /dev/null
New Relic's code emulates this request closely, but differences in setup mean that this command will sometimes succeed even if the pinger request is failing.
Get support at support.newrelic.com when this happens. New Relic may be able to make requests more compatible.
- Downtime with 200 response code
New Relic pingers will receive no response when there are timeouts or DNS resolution issues. The
Response Contentarea of this alert will capture the next successful response in an attempt to provide useful information. This is why you may see 200 response codes in downtime events.
For more help
Additional documentation resources include:
- Availability report (description, procedures, flag for maintenance events)