Page 1 of 1

Scheduled Downtime keeps being set by script

Posted: Thu Oct 18, 2018 12:34 am
by kwhogster
Nagios 4.3.4
Nsclient 4.4.3
Windows 2012 R2 Windows 2016 Windows 2019 Windows 10 Windows 8.1

I created a fake service that is set to manual on all the machines OS above.
When I apply updates to the machines I want to start the fake service named patches.
When patches is started the I run a script on the Nagios server that sets the downtime

That all works fine.
My problem is that it still continues to schedule a downtime over and over even after restarting server or desktop and the fake service is stopped.

Here are my entries

Code: Select all

# patches.proto
define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_nrpe!check_winservice! -a '--service Patches'
        check_period            backup_period
        notification_period     backup_period
        servicegroups           Services
        }

define service{
        use                     generic-service
        host_name               hostname
        service_description     Schedule Patching Downtime
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    w,u,c,r
        check_command           patching_downtime
        check_period            backup_period
        notification_period     backup_period
        servicegroups           Services
        }

define servicedependency {
        host_name                       hostname
        service_description             Patching Service Check
        dependent_service_description   Schedule Patching Downtime
        execution_failure_criteria      u,c,p,w
        notification_failure_criteria   u,c,p,w
        dependency_period               24x7
        servicegroups                   Services

Code: Select all

#!/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host=$1

/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile


root@tgcs017:/usr/local/nagios/etc/objects/windowsservers#

Code: Select all

# 'patching_downtime.sh' command definition
define command{
    command_name        patching_downtime
    command_line        /usr/local/nagios/libexec/patching_downtime.sh
}
So in summary when the service patching is stopped no action should take place. As soon as the service is started then the Downtime should be scheduled.

Any thoughts

Thank you in advance

Tom

Re: Scheduled Downtime keeps being set by script

Posted: Fri Oct 19, 2018 10:36 am
by ssax
I set this up a different way, try this:

Code: Select all

define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_nrpe!check_winservice! -a '--service Patches'
        check_period            backup_period
        notification_period     backup_period
        event_handler           patching_downtime
    	 event_handler_enabled   1
        servicegroups           Services
        }

Code: Select all

define command {
    command_name    patching_downtime
    command_line    $USER1$/patching_downtime.sh '$HOSTNAME$' $SERVICESTATEID$ $HOSTDOWNTIME$
}

Code: Select all

#!/usr/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host="$1"
SERVICESTATEID="$2"
HOSTDOWNTIME="$3"
if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
    /bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi

exit 0

The way this works is that the service will always be in a CRITICAL state when you're not patching, when you start the fake service it will return an OK state. The event handler runs on every state change so when it goes to OK it will only submit the downtime if the current service state is OK (meaning your fake service is now running) AND only if the host is NOT already in downtime.

Will that work for you?

Re: Scheduled Downtime keeps being set by script

Posted: Sat Oct 20, 2018 12:05 pm
by kwhogster
I now have this

Code: Select all

# 'patching_downtime.sh' command definition
define command{
    command_name        patching_downtime
    command_line        /usr/local/nagios/libexec/patching_downtime.sh '$HOSTNAME$' $SERVICESTATEID$ $HOSTDOWNTIME$
}
define command{
        command_name    check_win_patches
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_service -a service=Patches "crit=state = 'stopped'" warn=none
        }

Code: Select all

# patches.proto
define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_win_patches
        event_handler           patching_downtime
        event_handler_enabled   1
        servicegroups           Windows Updates
        }

define servicedependency {
        host_name                       hostname
        service_description             NRPE Status
        dependent_service_description   Patching Service Check
        execution_failure_criteria      u,c,p,w
        notification_failure_criteria   u,c,p,w
        dependency_period               24x7
}

Code: Select all

#!/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host=$1
SERVICESTATEID="$2"
HOSTDOWNTIME="$3"

if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
   /usr/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi

exit 0

I changes my thoughts on this. The service Patches is to be started always and when it stops then I want to trigger the event.
Also I had to change the check_winservice to check_service check_winservce was not producing the return codes as expected.


When I stop the service it goes to critical

But It never schedules downtime I tailed the Nagios log HTH you

Code: Select all

root@tgcs017:/usr/local/nagios/etc/objects# tail /usr/local/nagios/var/nagios.log
[1540053576] Successfully launched command file worker with pid 28862
[1540053728] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW012;WSUS;1540053726
[1540053803] SERVICE ALERT: TGKW012;WSUS;UNKNOWN;SOFT;1;Command check_updates didn't terminate within the timeout period 60s
[1540053811] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;SOFT;1;CRITICAL: Patches=stopped (auto), delayed ()
[1540053811] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;1;patching_downtime
[1540053852] SERVICE ALERT: TGKW012;WSUS;OK;SOFT;2;scripts\check_windows_updates.ps1 : The module 'scripts' could not be loaded.
[1540053871] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;SOFT;2;CRITICAL: Patches=stopped (auto), delayed ()
[1540053871] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;2;patching_downtime
[1540053931] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;HARD;3;CRITICAL: Patches=stopped (auto), delayed ()
[1540053931] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;HARD;3;patching_downtime
I tried running the script manually and i do not see anything either
root@tgcs017:/usr/local/nagios/libexec# ./patching_downtime.sh TGKW012
root@tgcs017:/usr/local/nagios/libexec#




Little puzzled on why it is still not working.

Any ideas


Thank you,

Re: Scheduled Downtime keeps being set by script

Posted: Sun Oct 21, 2018 9:53 pm
by kwhogster
Is there any way I can find out what codes are being sent to the script?

When I run the script
./patching_downtime.sh testhost 0 0

That scheduled the downtime.

But when I stop the patches service nothing happens

I tail the Nagios.log but I see not activity

Any ideas?

Re: Scheduled Downtime keeps being set by script

Posted: Mon Oct 22, 2018 1:46 pm
by scottwilkerson
I can see the patching_downtime event handler is being triggered

Code: Select all

[1540053811] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;1;patching_downtime
But your code only executes if $SERVICESTATEID is 0 which it wouldn't be if it was CRITICAL above, it would be 2
kwhogster wrote:

Code: Select all

if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
   /usr/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi

Re: Scheduled Downtime keeps being set by script

Posted: Mon Oct 22, 2018 8:20 pm
by kwhogster
Thanks

Changing to a 2 did the trick


This can now be locked as resloved