Alert on infrastructure processes

Use New Relic infrastructure's Process running alert condition to be notified when a set of processes on your filtered hosts stop running for a configurable number of minutes. This is useful, for example, when:

Any of the processes on the hosts stop reporting
A process is running too many instances on one host

This feature's flexibility allows you to easily filter what hosts and processes to monitor and when to notify selected individuals or teams. In addition, the email notification includes links to help you quickly troubleshoot the situation.

Important

By default, the infrastructure agent doesn't send data about the operating system's processes. To enable the sending of process data set enable_process_metrics to true. To fine-tune which processes you want to monitor, configure include_matching_metrics or exclude_matching_metrics.

Examples

By applying filters to the hosts and processes that are important to your business, you can define alerting thresholds to decide when alert events open and New Relic sends an email notification to you depending on the policy's alert event preferences. These examples illustrate how to use infrastructure monitoring's Process running condition to monitor your processes.

Problem: Some load balancers and application servers work by running many worker processes in parallel. Here, for example, you may want an alert event when fewer than eight processes are running for a service like gunicorn.

Solution: Depending on the situation, use any of these Process running thresholds options as needed:

More than the defined number of processes are running
Exactly the defined number of processes are running
Fewer than the defined number of processes are running

Problem: Sometimes problems with processes can be solved with changes to your configuration. For example, you have more than one Chef process running, and you may need to address an issue with how that service is configured.

Solution: Depending on the situation, use any of these Process running thresholds options as needed:

More than the defined number of processes are running
Exactly the defined number of processes are running
Fewer than the defined number of processes are running

Create an infrastructure process running condition

To define the Process running alert criteria:

Go to one.newrelic.com > All capabilities > Infrastructure. Mouse over a chart you want to alert on, select the ellipses icon, and then select Create alert condition.
Type a meaningful condition name.
Select Process running as the Alert type.
Filter what hosts and processes you want the alert condition to apply to.
Define the Critical threshold for triggering the alert notification: minimum 1 minute, default 5 minutes, and maximum 60 minutes.
Optional: To create the condition criteria proactively but not receive alert notifications at this time, turn off the Enabled checkbox option.
Select an existing policy for the new condition.
OR
Select the option to create a new policy and identify the email for alert notifications.
Optional: Add a runbook url.
Optional: Set the Close open alert events after time limit to automatically close open alert events after a certain amount of time (this defaults to 24 hours for infrastructure conditions).
Select Create.

If you create the alert condition directly with infrastructure monitoring, New Relic will send an email notification when the defined threshold for the alert condition passes depending on the policy's alert event preferences. Your alert policy defines which personnel or teams and which notification channels we use.

Important

Examples

Ensure enough processes are running to satisfy load

Ensure that critical services run constantly

Monitor startup for critical processes that require special attention

Make sure a job doesn't take too long

Watch for runaway processes or configuration problems

Create an infrastructure process running condition

Alert on infrastructure processes

Important

Examples .css-21sua1{background:none;border:none;width:0;padding:0;}

Ensure that critical services run constantly

Monitor startup for critical processes that require special attention

Make sure a job doesn't take too long

Watch for runaway processes or configuration problems

Create an infrastructure process running condition

Examples