Page 1 of 1

Scheduled Downtime is still sending notifications

Posted: Tue Aug 25, 2015 9:24 am
by cesarpball
Hello nagios geeks!

I've recently upgraded one nagios box from 3.8 to 4.0.8
Yesterday and today we had patching interventions so I scheduled downtime all the services and the host, however, I was alerted just for one service when the service was down!
This happened too yesterday. Its quite odd. Any suggestions?

Code: Select all


[b][1440500468] SERVICE DOWNTIME ALERT: ad.stage;windows_services;STARTED; Service has entered a period of scheduled downtime[/b]
[1440502528] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL:
[1440502558] SERVICE ALERT: ad.stage;windows_services;OK;SOFT;2;OK: All services are in their appropriate state.
[1440502852] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL: 
[1440502858] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;1;CRITICAL: 
[1440502882] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;2;CRITICAL: 
[1440502888] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;2;CRITICAL: 
[1440502913] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;3;CRITICAL: 
[1440502918] SERVICE ALERT: ad.stage;windows_services;CRITICAL;SOFT;3;CRITICAL: 
[1440502942] SERVICE ALERT: ad.stage;windows_services;CRITICAL;HARD;4;CRITICAL: 
[1440502948] SERVICE ALERT: ad.stage;windows_services;CRITICAL;HARD;4;CRITICAL: 
[b][1440502948] SERVICE NOTIFICATION: notification-email-alert;ad.stage;windows_services;CRITICAL;notify-by-email;CRITICAL:[/b]
[1440503251] SERVICE ALERT: ad.stage;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440503253] SERVICE ALERT: ad.stage;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440503253] SERVICE NOTIFICATION: notification-email-alert;ad.stage;windows_services;OK;notify-by-email;OK: All services are in their appropriate state.
[b][1440509457] SERVICE DOWNTIME ALERT: ad.stage;windows_services;STOPPED; Service has exited from a period of scheduled downtime[/b]
Any help?

Re: Scheduled Downtime is still sending notifications

Posted: Tue Aug 25, 2015 1:06 pm
by tgriep
With the Duplicate Service Alerts happening seconds apart kind of points to that when the upgrade happened, the configs were duplicated.
Can you check the config settings and see if that happened?
Can you run the following and post the output here?

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Re: Scheduled Downtime is still sending notifications

Posted: Wed Aug 26, 2015 10:39 am
by cesarpball
Thanks for your reply,
This is the output

Code: Select all

# /usr/bin/nagios -v /etc/nagios/nagios.cfg

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 100 services.
        Checked 305 hosts.
        Checked 299 host groups.
        Checked 103 service groups.
        Checked 707 contacts.
        Checked 300 contact groups.
        Checked 312 commands.
        Checked 10 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 305 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 10 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
I am trying to identify any configuration file duplicated but I don't find it.

Re: Scheduled Downtime is still sending notifications

Posted: Wed Aug 26, 2015 5:06 pm
by ssax
Please post a sanitized copy of your service definition and any templates that it uses.

Re: Scheduled Downtime is still sending notifications

Posted: Fri Aug 28, 2015 3:26 am
by cesarpball

Code: Select all


define service {
        service_description     windows_services
        display_name            Windows Services
        check_command           check_nrpe!check_services
        use                     generic-service
        hostgroup_name          windows
        _criticality            medium
}

I use this template for all the nagios that I have, but just its failing in one of them.

Code: Select all

# generic service template definition
define service{
        use                             remote
        name                            generic-service ; The 'name' of this service template
        ;active_checks_enabled           1       ; Active service checks are enabled
        ;passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        ;obsess_over_service             1       ; We should obsess over this service (if necessary)
        ;check_freshness                 1       ; Default is to NOT check service 'freshness'
        ;freshness_threshold            900
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                notification_interval           15              ; Only send notifications on status change by default.
                is_volatile                     0
                ;check_period                    24x7
                normal_check_interval           10
                retry_check_interval            1
                max_check_attempts              4
                notification_period             24x7
                notification_options            u,c,r,f
                #contact_groups                  admins
        _nrpecheck      check_nrpe
        _criticality    normal
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

Today I got another alert:

Code: Select all

[1440745254] SERVICE DOWNTIME ALERT: windows_pet_serv;windows_services;STARTED; Service has entered a period of scheduled downtime
[1440745741] SERVICE ALERT: windows_pet_serv;windows_services;UNKNOWN;SOFT;1;CHECK_NRPE: Socket timeout after 30 seconds.
[1440745744] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;2;Connection refused by host
[1440745771] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;3;Connection refused by host
[1440745801] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;HARD;4;Connection refused by host
[1440745818] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;1;Connection refused by host
[1440745863] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;2;CRITICAL: SERVICE1: stopped (critical), SERVICE2: stopped (critical), SERVICE3: sto                                       pped (critical)
[1440745878] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;SOFT;3;CRITICAL: SERVICE1: stopped (critical), SERVICE3: stopped (critical)
[1440745908] SERVICE ALERT: windows_pet_serv;windows_services;CRITICAL;HARD;4;CRITICAL: SERVICE1: stopped (critical), SERVICE3: stopped (critical)
[1440745908] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;CRITICAL;service-notify-by-email2;CRITICAL: SERVICE1: stopped (criti                                       cal), UALSVC: stopped (critical)
[1440745908] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;CRITICAL;notify-by-email;CRITICAL: SERVICE1: stopped (critical),                                        UALSVC: stopped (critical)
[1440746101] SERVICE ALERT: windows_pet_serv;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440746208] SERVICE ALERT: windows_pet_serv;windows_services;OK;HARD;4;OK: All services are in their appropriate state.
[1440746208] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;OK;service-notify-by-email2;OK: All services are in their appro                                       priate state.
[1440746208] SERVICE NOTIFICATION: internalmonitor-alert;windows_pet_serv;windows_services;OK;notify-by-email;OK: All services are in their appropriate                                        state.

Thanks very much for all your help!

Re: Scheduled Downtime is still sending notifications

Posted: Fri Aug 28, 2015 12:22 pm
by tgriep
When downtime is scheduled, receiving alerts are normal but you should not receive a notification (Email) during this time.
Did you receive an email when that service was down during the scheduled downtime?
Here is a quick description of Scheduled Downtime.
https://assets.nagios.com/downloads/nag ... ntime.html

Re: Scheduled Downtime is still sending notifications

Posted: Tue Sep 01, 2015 4:24 am
by cesarpball
Yep,

That's the problem. We are getting emails when the service is in scheduled downtime, but we are just getting alerts for one of the service of the monitor (not all the services)!

Re: Scheduled Downtime is still sending notifications

Posted: Tue Sep 01, 2015 11:49 am
by tgriep
There is another template called "remote" can you post that?

Re: Scheduled Downtime is still sending notifications

Posted: Wed Sep 02, 2015 3:11 am
by cesarpball
Hello,

This is the other one:

Code: Select all

# cat remote/service-remote.cfg
define service {
        name                    remote
        active_checks_enabled   1
        passive_checks_enabled  1
        check_period            24x7
        check_freshness         0
        _nagios_url             https://nagios.myserver.net/nagios/cgi-bin/extinfo.cgi?type=2&
        freshness_threshold     900
        register                0
}
The problem could be for this one ??

Thanks

Re: Scheduled Downtime is still sending notifications

Posted: Wed Sep 02, 2015 3:43 pm
by tgriep
Your configs look good so far. Let's try and stop the nagios process to see if there is a stuck process still running.
Run this in a shell

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
If this doesn't work, I will need all of the config files to find it.
You can PM the files to me if you like.