Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
The fourth host or service filter that must be passed is the time period test. Each host and service definition has a option that specifies which time period contains valid notification times for the host or service. If the time that the notification is being made does not fall within a valid time range in the specified time period, no one gets contacted. If it falls within a valid time range, the notification gets passed to the next filter... Note: If the time period filter is not passed, Nagios will reschedule the next notification for the host or service (if its in a non-OK state) for the next valid time present in the time period. This helps ensure that contacts are notified of problems as soon as possible when the next valid time in time period arrives.
define timeperiod{
timeperiod_name test
alias Except test Hours
monday 08:20-08:15
tuesday 08:20-08:15
wednesday 08:20-08:15
thursday 08:20-08:15
friday 08:20-08:15
saturday 08:20-08:15
sunday 08:20-08:15
}
So if a service goes down between 08:15-08:20, I don't get notified via mail. But the service remained down for another 2 hours, which were during the valid time period, and I still didn't get any notification.
Time periods work in a 24 hour clock. That being said, your current definition would attempt to overlap days and that is not possible. Try taking a look at the on-call rotation for an idea how to use exclusions. You need to create a 24hour period, and then exclude 8:15 - 8:20 and apply this to your service. http://nagios.sourceforge.net/docs/3_0/ ... ation.html
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Are you sure the config objects in question were checked in those 2 minutes and were indeed in a HARD failure state?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
You're right, redid the check using a broader time range. But still no go, I don't get notified after the excluded period passes. I do get notified, however, if I manually reschedule the service check OR if I reload nagios.
Here's the service template.
So you are finding that it is not checking at all post downtime, and therefore not posting any further hard states?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
After some discussion and code review. Provided the state has not changed from a hard warning or critical, the notification will not happen until the next notification interval. The interval counter is properly decremented and counted during excluded time periods. This means that if your host\service enters a hard state 1 minute prior to being out of excluded notification time, it will not notify until the proper notification interval has passed, regardless of check results provided they stay in the same state. However if your host\service change state after the excluded time, provided you are set to receive them, you will be notified.
SO, what we need to validate, is that if your notification interval is 30 minutes, at a maximum of 30 minutes after re-entering notification time, that you are sent a notification for the check failing.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
sreinhardt wrote:Provided the state has not changed from a hard warning or critical, the notification will not happen until the next notification interval.
In my tests I had notification_interval set to 120 seconds, and still did not get notified after the excluded timeperiod had passed.
In the problem I had in 'production', and also in my tests, the service\host would not change states, it would remain in a hard state == critical.
I left the alert over night and it run for 16h 30m. It started to send notifications only after the log rotate, which means that for 8 hours (before the logrotate) it didn''t send anything.