We kept noticing that our NagiosXI alerts seemed a bit delayed. To test, we purposely shut down a server not in production, one that Nagios monitors.
I noticed in the Event Log that at 8:57, NagiosXI reported and logged that the host / services were down.
We noticed that notifications weren't sent out until 9:02, exactly 5 minutes later.
We would prefer Nagios to send out notifications immediately if a host and / or its services are down. In our host and service definitions, I noticed the following values under Check settings:
Retry interval: 1 min
Check interval: 5 min
What is the easiest way to configure our host and service definitions to send out notifications immediately? Do we just need to adjust the check interval to 1 min rather than 5?
Thanks for any help.
Notification delay of 5 minutes
Re: Notification delay of 5 minutes
Yes that would be correct:
*Note: I would not set all to 1 minute but prioritize for mission critical as this will have checks calling faster and could possibly overload the system, depending on the setup.Monitoring Settings
Specify the parameters that determine how the host should be monitored.
Under normal circumstances...
Monitor the host every 5 (lower this)minutes.
When a potential problem is first detected...
Re-check the host every 1 minutes up to 5times before generating an alert.
Re: Notification delay of 5 minutes
Also, by default, under the Notifications tab ot should look like this:
Notification Settings
Notification Settings
You may want to check if that has been altered as well.Specify the parameters that determine how notifications should be sent for the host.
When a problem is detected...
Don't send any notifications
*Send a notification immediately
Wait 0 minutes before sending a notification
Re: Notification delay of 5 minutes
There's actually an alert setting in the Core Config Manager -> Host/Service - > Modify -> Alert Settings(tab) for "First Notification Delay." The default for this is probably 5mn in one of the generic templates. This setting is there in case you don't want notifications every time a host/service is flapping in and out of a state.
This is actually the 'check_interval" directive.Monitor the host every [5] minutes.
Re: Notification delay of 5 minutes
As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?
If so, this variable is blank for the services, and a value of 0 is there for the host configs under the Alert Settings tab.
If so, this variable is blank for the services, and a value of 0 is there for the host configs under the Alert Settings tab.
Re: Notification delay of 5 minutes
Correct. It's an optional config directive, so it may not be specified anywhere.As far as waiting 0 minutes before sending a notification, are you referring to "first_notification_delay" ?
You can also run:
yum install ntp
nptdate pool.ntp.org
To make sure your system time is correct.
Re: Notification delay of 5 minutes
Right, NTP is one of the first packages we install when building a centos machine....
We will have to go through and determine which hosts and services are the most important.
We will have to go through and determine which hosts and services are the most important.
-
tonyyarusso
- Posts: 1128
- Joined: Wed Mar 03, 2010 12:38 pm
- Location: St. Paul, MN, USA
- Contact:
Re: Notification delay of 5 minutes
There are actually two things at work here. It is true that first_notification_delay will determine a delay (or lack thereof) between when Nagios determines there is a problem and when it decides to tell you about it. However, I don't think that is what you are dealing with here. (In fact, I think this is set to 0 by default.)
Rather, you are likely facing the fact that Nagios by default does not say that there definitely is a problem at the first sign of one existing, but instead attempts to verify that something is in fact wrong and it wasn't just a fluke result - a handy feature when you consider that a notification might wake up a cranky sleeping sysadmin. When the first check result indicating a problem comes in, Nagios sets the host or service to what's called a "soft" problem state. It then switches from checking it as often as specified by check_interval to the frequency specified by retry_interval. At this stage it has not sent any notification. Running checks at the retry interval now, if the results keep coming back bad, after receiving the number of bad results specified by max_check_attempts then Nagios will assign the host or service a "hard" problem state. It is at this point that a notification will be sent. Since by default retry_interval is 1 minute and max_check_attempts is 5, this explains why the notification occurs 5 minutes after the initial indication of a problem. The solution would be to lower max_check_attempts to 1 (or if you want to get fancy you could make retry_interval shorter, but that raises other complications).
Rather, you are likely facing the fact that Nagios by default does not say that there definitely is a problem at the first sign of one existing, but instead attempts to verify that something is in fact wrong and it wasn't just a fluke result - a handy feature when you consider that a notification might wake up a cranky sleeping sysadmin. When the first check result indicating a problem comes in, Nagios sets the host or service to what's called a "soft" problem state. It then switches from checking it as often as specified by check_interval to the frequency specified by retry_interval. At this stage it has not sent any notification. Running checks at the retry interval now, if the results keep coming back bad, after receiving the number of bad results specified by max_check_attempts then Nagios will assign the host or service a "hard" problem state. It is at this point that a notification will be sent. Since by default retry_interval is 1 minute and max_check_attempts is 5, this explains why the notification occurs 5 minutes after the initial indication of a problem. The solution would be to lower max_check_attempts to 1 (or if you want to get fancy you could make retry_interval shorter, but that raises other complications).