Notification not sent on Recovery after Scheduled Downtime

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
mrussi
Posts: 6
Joined: Thu Sep 07, 2017 2:24 pm

Notification not sent on Recovery after Scheduled Downtime

Post by mrussi »

Hello,

We have a Nagios 4.3.1 install and have found that Nagios is not sending notifications on recovery of a service after scheduled downtime has ended. We've enabled debug output for notifications and found Nagios would log "We shouldn't notify about this recovery" when it would perform the notification viability test at the time of recovery.

We believe it's been occurring ever since we enabled the Scheduled Downtime notifications many months ago, but it was not fully realized until it began to show up in our reports. This is happening for every service that enters Scheduled Downtime in an OK state, goes to WARN/CRIT during scheduled downtime, and then remains in that WARN/CRIT state while exiting scheduled downtime. The return of the service to OK is never notified which is breaking our automated resolution process.

In looking through the source, I found two occurrences of this debug logging. I'm assuming this is happening because the svc->notified_on flag is either not set or it's set to 0?

https://github.com/NagiosEnterprises/na ... #L540-L543
https://github.com/NagiosEnterprises/na ... #L703-L707

Any idea of something we have misconfigured?

---------------------------------------------------------------------------------------------

Example timeline:

Wednesday, April 1, 2020 6:59:59 PM GMT-04:00 DST
- Service is in OK state. Service enters Scheduled Downtime. Notification is sent.

Wednesday, April 1, 2020 7:35:07 PM GMT-04:00 DST
- Check goes into CRITICAL state while in Scheduled Downtime. Notification is not sent, as expected.

Thursday, April 2, 2020 7:05:00 AM GMT-04:00 DST
- Scheduled Downtime ends. Notification is sent that the check is in a CRITICAL state.

Thursday, April 2, 2020 7:05:08 AM GMT-04:00 DST
- Check goes into OK state but no notification is sent. "We shouldn't notify about this recovery."

Code: Select all

[1585781999] SERVICE DOWNTIME ALERT: host150;Check_Process_Nodename_1__TP2_host150;STARTED; Service has entered a period of scheduled downtime
[1585781999] SERVICE NOTIFICATION: nagiosadmin;host150;Check_Process_Nodename_1__TP2_host150;DOWNTIMESTART (OK);notify-service-by-webhook;PROCS OK: 1 process with args '...'
[1585784107] SERVICE ALERT: host150;Check_Process_Nodename_1__TP2_host150;CRITICAL;HARD;1;PROCS CRITICAL: 0 processes with args '...'[1585825499] SERVICE DOWNTIME ALERT: host150;Check_Process_Nodename_1__TP2_host150;STOPPED; Service has exited from a period of scheduled downtime
[1585825500] SERVICE NOTIFICATION: nagiosadmin;host150;Check_Process_Nodename_1__TP2_host150;DOWNTIMEEND (CRITICAL);notify-service-by-webhook;PROCS CRITICAL: 0 processes with args '...'
[1585825508] SERVICE ALERT: host150;Check_Process_Nodename_1__TP2_host150;OK;HARD;1;PROCS OK: 1 process with args '...'
debug logs:

Code: Select all

[1585825499.999996] [032.0] [pid=62128] ** Service Notification Attempt ** Host: 'host150', Service: 'Check_Process_Nodename_1__TP2_host150', Type: DOWNTIMEEND, Options: 0, Current State: 2, Last Notification: Wed Dec 31 19:00:00
1969
[1585825500.000060] [032.0] [pid=62128] Notification viability test passed.
[1585825500.000062] [032.1] [pid=62128] Current notification number: 0 (unchanged)
[1585825500.000065] [032.2] [pid=62128] Creating list of contacts to be notified.
[1585825500.000067] [032.1] [pid=62128] Service notification will NOT be escalated.
[1585825500.000070] [032.1] [pid=62128] Adding normal contacts for service to notification list.
[1585825500.000082] [032.2] [pid=62128] Adding members of contact group 'nagiosadmin' for service to notification list.
[1585825500.000085] [032.2] [pid=62128] ** Checking service notification viability for contact 'nagiosadmin'...
[1585825500.000091] [032.2] [pid=62128] Adding contact 'nagiosadmin' to notification list.
[1585825500.000105] [032.2] [pid=62128] ** Notifying contact 'nagiosadmin'
[1585825500.000110] [032.2] [pid=62128] Raw notification command: /usr/bin/notification ...
[1585825500.000163] [032.0] [pid=62128] 1 contacts were notified.

[1585825508.282782] [032.0] [pid=62128] ** Service Notification Attempt ** Host: 'host150', Service: 'Check_Process_Nodename_1__TP2_host150', Type: NORMAL, Options: 0, Current State: 0, Last Notification: Wed Dec 31 19:00:00 1969
[1585825508.282883] [032.1] [pid=62128] We shouldn't notify about this recovery.
[1585825508.282887] [032.0] [pid=62128] Notification viability test failed.  No notification will be sent out.
contact.cfg

Code: Select all

define contact {
    contact_name                    nagiosadmin
    alias                           nagiosadmin
    service_notification_period     24x7
    host_notification_period        24x7
    service_notification_options    c,r,w,u,f,s
    host_notification_options       d,r,u,f,s
    service_notification_commands   notify-service-by-webhook
    host_notification_commands      notify-host-by-webhook
    host_notifications_enabled      1
    service_notifications_enabled   1
}
contact_groups.cfg

Code: Select all

define contactgroup {
        contactgroup_name               admin
        alias                           Administrators
        members                         nagiosadmin,linuxadmins
}
services.cfg

Code: Select all

define service {
        use                           PRD Service Template
        contacts                      appadmin
        notification_period           07:05-19:00 MTWTFxx
        check_command                 check_nrpe!5659!check_procs -a '...'
        contact_groups                admin
        host_name                     host150
        service_description           Check_Process_Nodename_1__TP2_host150
}
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Notification not sent on Recovery after Scheduled Downti

Post by cdienger »

I'd recommend updating to the latest build where there have been fixes to recovery notifications including this one in 4.4.3:
Fixed escalation notifications logic and recovery notifications not going out (#582)
https://www.nagios.org/projects/nagios-core/history/4x/
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked