Notification delay of 5 minutes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
jpipitone
Posts: 102
Joined: Tue Oct 12, 2010 1:21 pm

Notification delay of 5 minutes

Post by jpipitone »

We kept noticing that our NagiosXI alerts seemed a bit delayed. To test, we purposely shut down a server not in production, one that Nagios monitors.

I noticed in the Event Log that at 8:57, NagiosXI reported and logged that the host / services were down.

We noticed that notifications weren't sent out until 9:02, exactly 5 minutes later.

We would prefer Nagios to send out notifications immediately if a host and / or its services are down. In our host and service definitions, I noticed the following values under Check settings:

Retry interval: 1 min
Check interval: 5 min

What is the easiest way to configure our host and service definitions to send out notifications immediately? Do we just need to adjust the check interval to 1 min rather than 5?

Thanks for any help.
rdedon
Posts: 578
Joined: Sat Nov 20, 2010 4:51 pm

Re: Notification delay of 5 minutes

Post by rdedon »

Yes that would be correct:
Monitoring Settings

Specify the parameters that determine how the host should be monitored.
Under normal circumstances...
Monitor the host every 5 (lower this)minutes.
When a potential problem is first detected...
Re-check the host every 1 minutes up to 5times before generating an alert.
*Note: I would not set all to 1 minute but prioritize for mission critical as this will have checks calling faster and could possibly overload the system, depending on the setup.
Rene deDon
Technical Team
___
Nagios Enterprises, LLC
Web: http://www.nagios.com
rdedon
Posts: 578
Joined: Sat Nov 20, 2010 4:51 pm

Re: Notification delay of 5 minutes

Post by rdedon »

Also, by default, under the Notifications tab ot should look like this:
Notification Settings
Specify the parameters that determine how notifications should be sent for the host.
When a problem is detected...
Don't send any notifications
*Send a notification immediately
Wait 0 minutes before sending a notification
You may want to check if that has been altered as well.
Rene deDon
Technical Team
___
Nagios Enterprises, LLC
Web: http://www.nagios.com
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Notification delay of 5 minutes

Post by mguthrie »

There's actually an alert setting in the Core Config Manager -> Host/Service - > Modify -> Alert Settings(tab) for "First Notification Delay." The default for this is probably 5mn in one of the generic templates. This setting is there in case you don't want notifications every time a host/service is flapping in and out of a state.
Monitor the host every [5] minutes.
This is actually the 'check_interval" directive.
jpipitone
Posts: 102
Joined: Tue Oct 12, 2010 1:21 pm

Re: Notification delay of 5 minutes

Post by jpipitone »

As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?

If so, this variable is blank for the services, and a value of 0 is there for the host configs under the Alert Settings tab.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Notification delay of 5 minutes

Post by mguthrie »

As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?
Correct. It's an optional config directive, so it may not be specified anywhere.

You can also run:
yum install ntp
nptdate pool.ntp.org

To make sure your system time is correct.
jpipitone
Posts: 102
Joined: Tue Oct 12, 2010 1:21 pm

Re: Notification delay of 5 minutes

Post by jpipitone »

Right, NTP is one of the first packages we install when building a centos machine....

We will have to go through and determine which hosts and services are the most important.
tonyyarusso
Posts: 1128
Joined: Wed Mar 03, 2010 12:38 pm
Location: St. Paul, MN, USA
Contact:

Re: Notification delay of 5 minutes

Post by tonyyarusso »

There are actually two things at work here. It is true that first_notification_delay will determine a delay (or lack thereof) between when Nagios determines there is a problem and when it decides to tell you about it. However, I don't think that is what you are dealing with here. (In fact, I think this is set to 0 by default.)

Rather, you are likely facing the fact that Nagios by default does not say that there definitely is a problem at the first sign of one existing, but instead attempts to verify that something is in fact wrong and it wasn't just a fluke result - a handy feature when you consider that a notification might wake up a cranky sleeping sysadmin. When the first check result indicating a problem comes in, Nagios sets the host or service to what's called a "soft" problem state. It then switches from checking it as often as specified by check_interval to the frequency specified by retry_interval. At this stage it has not sent any notification. Running checks at the retry interval now, if the results keep coming back bad, after receiving the number of bad results specified by max_check_attempts then Nagios will assign the host or service a "hard" problem state. It is at this point that a notification will be sent. Since by default retry_interval is 1 minute and max_check_attempts is 5, this explains why the notification occurs 5 minutes after the initial indication of a problem. The solution would be to lower max_check_attempts to 1 (or if you want to get fancy you could make retry_interval shorter, but that raises other complications).
Tony Yarusso
Technical Services
___
TIES
Web: http://ties.k12.mn.us/
Locked