Notification delay of 5 minutes

jpipitone · Post by **jpipitone** » Wed May 04, 2011 3:17 pm

We kept noticing that our NagiosXI alerts seemed a bit delayed. To test, we purposely shut down a server not in production, one that Nagios monitors.

I noticed in the Event Log that at 8:57, NagiosXI reported and logged that the host / services were down.

We noticed that notifications weren't sent out until 9:02, exactly 5 minutes later.

We would prefer Nagios to send out notifications immediately if a host and / or its services are down. In our host and service definitions, I noticed the following values under Check settings:

Retry interval: 1 min
Check interval: 5 min

What is the easiest way to configure our host and service definitions to send out notifications immediately? Do we just need to adjust the check interval to 1 min rather than 5?

Thanks for any help.

rdedon · Post by **rdedon** » Wed May 04, 2011 4:27 pm

Yes that would be correct:

Monitoring Settings

Specify the parameters that determine how the host should be monitored.
Under normal circumstances...
Monitor the host every 5 (lower this)minutes.
When a potential problem is first detected...
Re-check the host every 1 minutes up to 5times before generating an alert.

*Note: I would not set all to 1 minute but prioritize for mission critical as this will have checks calling faster and could possibly overload the system, depending on the setup.

rdedon · Post by **rdedon** » Wed May 04, 2011 4:30 pm

Also, by default, under the Notifications tab ot should look like this:
Notification Settings

Specify the parameters that determine how notifications should be sent for the host.
When a problem is detected...
Don't send any notifications
*Send a notification immediately
Wait 0 minutes before sending a notification

You may want to check if that has been altered as well.

mguthrie · Post by **mguthrie** » Wed May 04, 2011 4:44 pm

There's actually an alert setting in the Core Config Manager -> Host/Service - > Modify -> Alert Settings(tab) for "First Notification Delay." The default for this is probably 5mn in one of the generic templates. This setting is there in case you don't want notifications every time a host/service is flapping in and out of a state.

Monitor the host every [5] minutes.

This is actually the 'check_interval" directive.

jpipitone · Post by **jpipitone** » Wed May 04, 2011 5:03 pm

As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?

If so, this variable is blank for the services, and a value of 0 is there for the host configs under the Alert Settings tab.

mguthrie · Post by **mguthrie** » Wed May 04, 2011 5:46 pm

As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?

Correct. It's an optional config directive, so it may not be specified anywhere.

You can also run:
yum install ntp
nptdate pool.ntp.org

To make sure your system time is correct.

jpipitone · Post by **jpipitone** » Thu May 05, 2011 7:32 am

Right, NTP is one of the first packages we install when building a centos machine....

We will have to go through and determine which hosts and services are the most important.

tonyyarusso · Post by **tonyyarusso** » Thu May 05, 2011 11:01 am

There are actually two things at work here. It is true that first_notification_delay will determine a delay (or lack thereof) between when Nagios determines there is a problem and when it decides to tell you about it. However, I don't think that is what you are dealing with here. (In fact, I think this is set to 0 by default.)

Rather, you are likely facing the fact that Nagios by default does not say that there definitely is a problem at the first sign of one existing, but instead attempts to verify that something is in fact wrong and it wasn't just a fluke result - a handy feature when you consider that a notification might wake up a cranky sleeping sysadmin. When the first check result indicating a problem comes in, Nagios sets the host or service to what's called a "soft" problem state. It then switches from checking it as often as specified by check_interval to the frequency specified by retry_interval. At this stage it has not sent any notification. Running checks at the retry interval now, if the results keep coming back bad, after receiving the number of bad results specified by max_check_attempts then Nagios will assign the host or service a "hard" problem state. It is at this point that a notification will be sent. Since by default retry_interval is 1 minute and max_check_attempts is 5, this explains why the notification occurs 5 minutes after the initial indication of a problem. The solution would be to lower max_check_attempts to 1 (or if you want to get fancy you could make retry_interval shorter, but that raises other complications).

Nagios Support Forum

Notification delay of 5 minutes

Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes

Re: Notification delay of 5 minutes