Scheduled Downtime keeps being set by script

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Scheduled Downtime keeps being set by script

Post by kwhogster »

Nagios 4.3.4
Nsclient 4.4.3
Windows 2012 R2 Windows 2016 Windows 2019 Windows 10 Windows 8.1

I created a fake service that is set to manual on all the machines OS above.
When I apply updates to the machines I want to start the fake service named patches.
When patches is started the I run a script on the Nagios server that sets the downtime

That all works fine.
My problem is that it still continues to schedule a downtime over and over even after restarting server or desktop and the fake service is stopped.

Here are my entries

Code: Select all

# patches.proto
define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_nrpe!check_winservice! -a '--service Patches'
        check_period            backup_period
        notification_period     backup_period
        servicegroups           Services
        }

define service{
        use                     generic-service
        host_name               hostname
        service_description     Schedule Patching Downtime
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    w,u,c,r
        check_command           patching_downtime
        check_period            backup_period
        notification_period     backup_period
        servicegroups           Services
        }

define servicedependency {
        host_name                       hostname
        service_description             Patching Service Check
        dependent_service_description   Schedule Patching Downtime
        execution_failure_criteria      u,c,p,w
        notification_failure_criteria   u,c,p,w
        dependency_period               24x7
        servicegroups                   Services

Code: Select all

#!/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host=$1

/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile


root@tgcs017:/usr/local/nagios/etc/objects/windowsservers#

Code: Select all

# 'patching_downtime.sh' command definition
define command{
    command_name        patching_downtime
    command_line        /usr/local/nagios/libexec/patching_downtime.sh
}
So in summary when the service patching is stopped no action should take place. As soon as the service is started then the Downtime should be scheduled.

Any thoughts

Thank you in advance

Tom
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Scheduled Downtime keeps being set by script

Post by ssax »

I set this up a different way, try this:

Code: Select all

define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_nrpe!check_winservice! -a '--service Patches'
        check_period            backup_period
        notification_period     backup_period
        event_handler           patching_downtime
    	 event_handler_enabled   1
        servicegroups           Services
        }

Code: Select all

define command {
    command_name    patching_downtime
    command_line    $USER1$/patching_downtime.sh '$HOSTNAME$' $SERVICESTATEID$ $HOSTDOWNTIME$
}

Code: Select all

#!/usr/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host="$1"
SERVICESTATEID="$2"
HOSTDOWNTIME="$3"
if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
    /bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi

exit 0

The way this works is that the service will always be in a CRITICAL state when you're not patching, when you start the fake service it will return an OK state. The event handler runs on every state change so when it goes to OK it will only submit the downtime if the current service state is OK (meaning your fake service is now running) AND only if the host is NOT already in downtime.

Will that work for you?
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Scheduled Downtime keeps being set by script

Post by kwhogster »

I now have this

Code: Select all

# 'patching_downtime.sh' command definition
define command{
    command_name        patching_downtime
    command_line        /usr/local/nagios/libexec/patching_downtime.sh '$HOSTNAME$' $SERVICESTATEID$ $HOSTDOWNTIME$
}
define command{
        command_name    check_win_patches
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_service -a service=Patches "crit=state = 'stopped'" warn=none
        }

Code: Select all

# patches.proto
define service{
        use                     generic-service
        host_name               hostname
        service_description     Patching Service Check
        is_volatile             0
        check_period            24x7
        max_check_attempts      3
        check_interval          5
        retry_interval          1
        contacts                noalerts
        notification_interval   0
        notification_period     24x7
        notification_options    n
        check_command           check_win_patches
        event_handler           patching_downtime
        event_handler_enabled   1
        servicegroups           Windows Updates
        }

define servicedependency {
        host_name                       hostname
        service_description             NRPE Status
        dependent_service_description   Patching Service Check
        execution_failure_criteria      u,c,p,w
        notification_failure_criteria   u,c,p,w
        dependency_period               24x7
}

Code: Select all

#!/bin/sh
# This is a sample shell script showing how you can submit the SCHEDULE_HOST_DOWNTIME command
# to Nagios. Adjust variables to fit your environment as necessary.
#
# SCHEDULE_HOST_DOWNTIME;<host_name>;<start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
#

now=`date +%s`
end=`date --date='1 hour' +%s`
commandfile='/usr/local/nagios/var/rw/nagios.cmd'
host=$1
SERVICESTATEID="$2"
HOSTDOWNTIME="$3"

if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
   /usr/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi

exit 0

I changes my thoughts on this. The service Patches is to be started always and when it stops then I want to trigger the event.
Also I had to change the check_winservice to check_service check_winservce was not producing the return codes as expected.


When I stop the service it goes to critical

But It never schedules downtime I tailed the Nagios log HTH you

Code: Select all

root@tgcs017:/usr/local/nagios/etc/objects# tail /usr/local/nagios/var/nagios.log
[1540053576] Successfully launched command file worker with pid 28862
[1540053728] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW012;WSUS;1540053726
[1540053803] SERVICE ALERT: TGKW012;WSUS;UNKNOWN;SOFT;1;Command check_updates didn't terminate within the timeout period 60s
[1540053811] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;SOFT;1;CRITICAL: Patches=stopped (auto), delayed ()
[1540053811] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;1;patching_downtime
[1540053852] SERVICE ALERT: TGKW012;WSUS;OK;SOFT;2;scripts\check_windows_updates.ps1 : The module 'scripts' could not be loaded.
[1540053871] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;SOFT;2;CRITICAL: Patches=stopped (auto), delayed ()
[1540053871] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;2;patching_downtime
[1540053931] SERVICE ALERT: TGKW012;Patching Service Check;CRITICAL;HARD;3;CRITICAL: Patches=stopped (auto), delayed ()
[1540053931] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;HARD;3;patching_downtime
I tried running the script manually and i do not see anything either
root@tgcs017:/usr/local/nagios/libexec# ./patching_downtime.sh TGKW012
root@tgcs017:/usr/local/nagios/libexec#




Little puzzled on why it is still not working.

Any ideas


Thank you,
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Scheduled Downtime keeps being set by script

Post by kwhogster »

Is there any way I can find out what codes are being sent to the script?

When I run the script
./patching_downtime.sh testhost 0 0

That scheduled the downtime.

But when I stop the patches service nothing happens

I tail the Nagios.log but I see not activity

Any ideas?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Scheduled Downtime keeps being set by script

Post by scottwilkerson »

I can see the patching_downtime event handler is being triggered

Code: Select all

[1540053811] SERVICE EVENT HANDLER: TGKW012;Patching Service Check;CRITICAL;SOFT;1;patching_downtime
But your code only executes if $SERVICESTATEID is 0 which it wouldn't be if it was CRITICAL above, it would be 2
kwhogster wrote:

Code: Select all

if [ "$SERVICESTATEID" = "0" ] && [ "$HOSTDOWNTIME" -eq "0" ]; then
   /usr/bin/printf "[%lu] SCHEDULE_HOST_DOWNTIME;%s;%lu;%lu;1;0;600;Patching Svc;Monitoring is disabled due to patching\n" ${now} ${host} ${now} ${end} > $commandfile
fi
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Scheduled Downtime keeps being set by script

Post by kwhogster »

Thanks

Changing to a 2 did the trick


This can now be locked as resloved
Locked