How alert condition incidents are closed

This document explains the different ways incidents can be closed.

How incidents automatically close

An incident will automatically close when the targeted signal returns to a non-breaching state for the time period indicated in the condition's thresholds. This wait time is called the recovery period.

For example: If the breaching behavior is "Apdex score below 0.80 at least once in 5 minutes," then the incident will automatically close when the Apdex score is equal to or higher than .80 for 5 consecutive minutes. The same applies to a "for at least x minutes" threshold: x minutes of non-breaching behavior are required to automatically close the incident.

When an incident closes automatically:

  1. The closing timestamp is backdated to the start of the recovery period.
  2. The evaluation resets and restarts from when the previous incident ended.

All conditions have an incident time limit setting that will automatically force-close a long-lasting incident.

Set a time limit for long-lasting incidents

The incident time limit setting will automatically force-close a long-lasting incident after the number of days/hours you select. This is most useful for ephemeral entities that, when they disappear, cause a continual incident that won't automatically close.

Limits and Defaults

  • All alert incidents will have an incident time limit applied to them. Most alert conditions will allow you to edit this field.
  • The default value, if one is not supplied during configuration, is 3 days (24 hours for infrastructure conditions).
  • The incident time limit for non-Infrastructure conditions can be set as low as 5 minutes, and as high as 30 days. If, for some reason, the signal is still breaching in 30 days, the incident will close, and a new incident will open. Infrastructure conditions can be set to the following hours: 1, 2, 4, 8, 12, 24, 48, or 72.

Tip

This setting is related to the inactive issue setting.

When the time periods in these two settings are different, our system uses the shorter time period, regardless of the setting. For example, if the close open incident time setting is 2 days and the inactive issue time setting is 3 days, our system would wait 2 days before closing the issue.

Examples:

  • You set the incident time limit to 12 hours. If that incident lasts for 12 hours, it will be closed at 12 hours and the condition's evaluation of that entity will be reset.
  • Your JVM has a CPU spike and this creates an incident. The JVM then crashes and is replaced by a new JVM. If you have not set an incident time limit, the crashed JVM’s incident will never close.