peterooney wrote:
Is there any way in which I can add a parameter to this to generate a critical alert if the service has been down for say, more than 30 minutes?
The key word here is
alert. You can't determine what indicates a critical state without modifying any given plugin, but you can adjust when Nagios Core sends out an
alert by adjusting the
retry_interval and
max_check_attempts values.
If you wanted to say "if problem persists for 30 minutes, send an alert" step 1 would be baking that logic into your service definition. Using the following
retry_interval and
max_check_attempts values:
Code: Select all
retry_interval 1
max_check_attempts 30
This effectively means that once a problem is first detected Nagios Core will check every 1 minute up to a maximum of 30 checks before triggering a HARD state. 1 * 30 = 30. This would give us effectively 30 minutes prior to the first notification/alert being triggered. \
The first step in Nagios that determines whether or not an alert/notification should be sent is whether or not the host/service is in a HARD state. More on those topics here:
https://assets.nagios.com/downloads/nag ... tions.html
https://assets.nagios.com/downloads/nag ... types.html
However, maybe I don't want to check every 1 minute 30 times over. That's a bit heavy on the system in general. Perhaps I'd rather check every 5 minutes, but still want to keep the "don't alert unless this has been a problem for 30 minutes or longer" logic in there. I might do something like this instead:
Code: Select all
retry_interval 5
max_check_attempts 6
With this configuration, once a problem is initially detected, Nagios Core will initiate a check every 5 minutes up to a maximum of 6 checks. 5 * 6 = 30. This, much like the previous example, gives us 30 minutes before an actual notification/alert is triggered.
This all assumes you're not doing things like service/host check stalking, you don't have a custom notification handler, etc.