Not rescheduling notifications if outside valid timeperiod

12csd · Post by **12csd** » Thu Oct 17, 2013 8:01 am

Hi,

In http://nagios.sourceforge.net/docs/3_0/ ... tions.html it is stated:

The fourth host or service filter that must be passed is the time period test. Each host and service definition has a option that specifies which time period contains valid notification times for the host or service. If the time that the notification is being made does not fall within a valid time range in the specified time period, no one gets contacted. If it falls within a valid time range, the notification gets passed to the next filter... Note: If the time period filter is not passed, Nagios will reschedule the next notification for the host or service (if its in a non-OK state) for the next valid time present in the time period. This helps ensure that contacts are notified of problems as soon as possible when the next valid time in time period arrives.

This isn't true. I have this time period defined:

Code: Select all

define timeperiod{ 
        timeperiod_name test 
        alias           Except test Hours 
        monday          08:20-08:15 
        tuesday         08:20-08:15 
        wednesday       08:20-08:15 
        thursday        08:20-08:15 
        friday          08:20-08:15 
        saturday        08:20-08:15 
        sunday          08:20-08:15 
        }

So if a service goes down between 08:15-08:20, I don't get notified via mail. But the service remained down for another 2 hours, which were during the valid time period, and I still didn't get any notification.

This is the service definition.

Code: Select all

define service{
        use                             generic-service
        service_description             TEST
        check_command                   check_nrpe!check_TEST
        host_name                       some.host.here
        notification_period             test
        }

How can I fix this?
Thanks.

sreinhardt · Post by **sreinhardt** » Thu Oct 17, 2013 2:03 pm

Time periods work in a 24 hour clock. That being said, your current definition would attempt to overlap days and that is not possible. Try taking a look at the on-call rotation for an idea how to use exclusions. You need to create a 24hour period, and then exclude 8:15 - 8:20 and apply this to your service. http://nagios.sourceforge.net/docs/3_0/ ... ation.html

12csd · Post by **12csd** » Sun Oct 20, 2013 5:12 pm

Hi, I tried your suggestion, but no go.

Code: Select all

define timeperiod{
        timeperiod_name 24x7
        name            24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

Code: Select all

define timeperiod{
        timeperiod_name test2
        alias           test2
        use             24x7
        exclude         test
}

Code: Select all

define timeperiod{
        timeperiod_name test
        alias           Except test Hours
        monday          17:57-17:59
        tuesday         17:57-17:59
        wednesday       17:57-17:59
        thursday        17:57-17:59
        friday          17:57-17:59
        saturday        17:57-17:59
        sunday          17:57-17:59
        }

Code: Select all

define service{
        use                             generic-service
        service_description             TEST
        check_command                   check_nrpe!check_TEST
        host_name                       some.host.here
        notification_period             test2
        }

I also set the notification_interval to 60s.
So I got no notification during 17:57-17:59, but still nothing after. Any other suggestions?

abrist · Post by **abrist** » Mon Oct 21, 2013 11:36 am

Are you sure the config objects in question were checked in those 2 minutes and were indeed in a HARD failure state?

12csd · Post by **12csd** » Thu Oct 24, 2013 11:50 am

You're right, redid the check using a broader time range. But still no go, I don't get notified after the excluded period passes. I do get notified, however, if I manually reschedule the service check OR if I reload nagios.
Here's the service template.

Code: Select all

define service{
        name                            generic-service
        active_checks_enabled           1              
        passive_checks_enabled          1              
        parallelize_check               1              
        obsess_over_service             1              
        check_freshness                 1              
        notifications_enabled           1              
        event_handler_enabled           1              
        flap_detection_enabled          0              
        failure_prediction_enabled      1              
        process_perf_data               1              
        retain_status_information       1              
        retain_nonstatus_information    1              
        is_volatile                     0              
        check_period                    24x7           
        max_check_attempts              2              
        check_interval                  120s           
        retry_interval                  60s            
        contact_groups                  admins         
        notification_options            w,u,c,r        
        notification_interval           120s           
        notification_period             24x7           
        register                        0              
        }

Version is Nagios Core 3.2.3. I am also watching the log and do see the SOFT and HARD alerts, but nothing afterwards.

sreinhardt · Post by **sreinhardt** » Fri Oct 25, 2013 12:33 pm

So you are finding that it is not checking at all post downtime, and therefore not posting any further hard states?

12csd · Post by **12csd** » Fri Oct 25, 2013 1:35 pm

Yes, no more hard states.
However, I believe that after max_check_attempts (which is set to 2) is reached, there won't be any more hard states.

sreinhardt · Post by **sreinhardt** » Fri Oct 25, 2013 2:12 pm

After some discussion and code review. Provided the state has not changed from a hard warning or critical, the notification will not happen until the next notification interval. The interval counter is properly decremented and counted during excluded time periods. This means that if your host\service enters a hard state 1 minute prior to being out of excluded notification time, it will not notify until the proper notification interval has passed, regardless of check results provided they stay in the same state. However if your host\service change state after the excluded time, provided you are set to receive them, you will be notified.

SO, what we need to validate, is that if your notification interval is 30 minutes, at a maximum of 30 minutes after re-entering notification time, that you are sent a notification for the check failing.

12csd · Post by **12csd** » Sat Oct 26, 2013 2:35 pm

sreinhardt wrote:Provided the state has not changed from a hard warning or critical, the notification will not happen until the next notification interval.

In my tests I had notification_interval set to 120 seconds, and still did not get notified after the excluded timeperiod had passed.
In the problem I had in 'production', and also in my tests, the service\host would not change states, it would remain in a hard state == critical.

12csd · Post by **12csd** » Sun Oct 27, 2013 7:16 am

I left the alert over night and it run for 16h 30m. It started to send notifications only after the log rotate, which means that for 8 hours (before the logrotate) it didn''t send anything.

Nagios Support Forum

Not rescheduling notifications if outside valid timeperiod

Not rescheduling notifications if outside valid timeperiod

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi

Re: Not rescheduling notifications if outside valid timeperi