[CLOSED] Event handler troubles

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
vos
Posts: 5
Joined: Wed Mar 30, 2016 12:33 pm

[CLOSED] Event handler troubles

Post by vos »

So I am trying to get event handlers to work on a remote service, when I manually run my event handler script it restarts the service fine. But when setup in nagios it does not trigger it. I have checked perms and so on and running out of ideas as to what I am missing here.

Event handler script in /etc/nagios/conf.d ..

Code: Select all

4 -rwxr-xr-x 1 nagios nagios 2946 Mar 30 13:29 restarting-services.sh

Code: Select all

#!/usr/bin/env bash 
                                                                                           
date=$(date)
 
# What state is the service in?
case "${1}" in
OK)
        # The service just came back up, so don't do anything...
        ;;
WARNING)
        # We don't really care about warning states, since the service is probably still running...
        ;;
UNKNOWN)
        # We don't know what might be causing an unknown error, so don't do anything...
        ;;
CRITICAL)
        # Aha!  The service appears to have a problem - perhaps we should restart the server...
 
        # Is this a "soft" or a "hard" state?
        case "${2}" in
 
        # We're in a "soft" state, meaning that Nagios is in the middle of retrying the
        # check before it turns into a "hard" state and contacts get notified...
        SOFT)
                # What check attempt are we on?  We don't want to restart the web server on the first
                # check, because it may just be a fluke!
                case "${3}" in
 
                # Wait until the check has been tried 3 times before restarting the service.
                # If the check fails on the 4th time (after we restart the service), the state
                # type will turn to "hard" and contacts will be notified of the problem.
                # Hopefully this will restart the service successfully, so the 4th check will
                # result in a "soft" recovery.  If that happens no one gets notified because we
                # fixed the problem!
                3)
                        printf "%s" "Restarting service ${6} (3rd soft critical state)...\n"
                        # Call NRPE to restart the service on the remote machine
                        /usr/lib/nagios/plugins/check_nrpe -H "${4}" -c restart-service -a "${5}"
                        echo "${date} - restart ${6} - SOFT"  >> /tmp/nagios-autorestart.log
                        ;;
                        esac
                ;;
        HARD)
                case "${3}" in
 
                4)
                        printf "%s" "Restarting ${6} service...\n"
                        # Call the init script to restart the service
                        echo "${date} - restart ${6} - HARD"  >> /tmp/nagios-autorestart.log
                        /usr/lib/nagios/plugins/check_nrpe -H "${4}" -c restart-service -a "${5}"
                        ;;
                        esac
                ;;
        esac
        ;;
esac
exit 0
commands.cfg of service with event handler

Code: Select all

define command {
	command_name	 check_salt
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -u -t 60 -c check_procs -a salt
}
templates.cfg - notifications are turned off for testing

Code: Select all

define service {
     name salt_alive
     use generic-service
     check_command check_salt
     description Check salt is alive
     notification_options         w,c,r
     notifications_enabled        0
     contact_groups linux-team
     host_name my.server.com
     event_handler	restart-service!salt
}
event-handlers.cfg

Code: Select all

define command {
	command_name     restart-service
	command_line     /etc/nagios/conf.d/restarting-services.sh "$SERVICESTATE$" "$SERVICESTATETYPE$" "$SERVICEATTEMPT$" "$HOSTADDRESS$" "$ARG1$" "$SERVICEDESC$"
}
On remote host in nrpe.d/custom.cfg

Code: Select all

command[restart-service]=/usr/bin/sudo /usr/sbin/service $ARG1$ restart
Can anyone spot what I am failing to here? Thanks for your time and any pointers.

EDIT:
Forgot to mention on the remote host I have allowed nagios user to be able to run the service command in sudoers.
Last edited by vos on Wed Mar 30, 2016 6:15 pm, edited 1 time in total.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Event handler troubles

Post by hsmith »

Do you see anything in /usr/local/nagios/var/nagios.log that pops up when the EH should have fired?
Former Nagios Employee.
me.
vos
Posts: 5
Joined: Wed Mar 30, 2016 12:33 pm

Re: Event handler troubles

Post by vos »

Its looking like its never sending a SOFT 3 or a HARD 4

Code: Select all

[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;OK;HARD;3;PROCS OK: 1 process with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;OK;HARD;3;restart-service!salt
[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;CRITICAL;SOFT;1;PROCS CRITICAL: 0 processes with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;CRITICAL;SOFT;1;restart-service!salt
[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;CRITICAL;SOFT;2;PROCS CRITICAL: 0 processes with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;CRITICAL;SOFT;2;restart-service!salt
[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;CRITICAL;HARD;3;PROCS CRITICAL: 0 processes with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;CRITICAL;HARD;3;restart-service!salt
[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;OK;HARD;3;PROCS OK: 1 process with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;OK;HARD;3;restart-service!salt
[Wed Mar 30 17:00:36 2016] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;MyServer.com;Check salt is alive;1459371610
[Wed Mar 30 17:00:36 2016] SERVICE ALERT: MyServer.com;Check salt is alive;CRITICAL;SOFT;1;PROCS CRITICAL: 0 processes with command name 'salt'
[Wed Mar 30 17:00:36 2016] SERVICE EVENT HANDLER: MyServer.com;Check salt is alive;CRITICAL;SOFT;1;restart-service!salt
so probably related to how the service is setup. Going to verify that as well again, but open to other thoughts ideas if I am off track still. Thanks
vos
Posts: 5
Joined: Wed Mar 30, 2016 12:33 pm

Re: Event handler troubles

Post by vos »

My generic service is setup for 3 max checks. Not sure what I am not grokking properly.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Event handler troubles

Post by rkennedy »

In your testing, were you using the root user or the nagios user? If the root, please try using the nagios user.

Just to verify, what are the permissions of the folder that contains restarting-services.sh?

Code: Select all

ls -l /etc/nagios/conf.d/
ls -l /etc/nagios/
From the Nagios machine, can you trigger the restarting of the service using NRPE? Please post the full input / output of this. Just trying to figure out what point it's not working at.
Former Nagios Employee
vos
Posts: 5
Joined: Wed Mar 30, 2016 12:33 pm

Re: Event handler troubles

Post by vos »

restarting-services.sh

Code: Select all

ls -la /etc/nagios/conf.d/
total 8
drwxr-xr-x 2 nagios nagios   35 Mar 30 17:58 .
drwxr-xr-x 7 root   root   4096 Mar 30 13:26 ..
-rwxr-xr-x 1 nagios nagios 2866 Mar 30 17:58 restarting-services.sh
I ran the above script manually as both root and as the nagios user

here is output as nagios user

Code: Select all

su - nagios -c "bash -vx /etc/nagios/conf.d/restart-services.sh CRITICAL SOFT 3 MyServer.com salt-minion Check salt is alive"
Restarting service Check (3rd soft critical state)...\nsalt-minion stop/waiting
salt-minion start/running, process 7277
Running from nrpe

Code: Select all

/usr/lib/nagios/plugins/check_nrpe -H MyServer.com -c restart-service -a salt-minion
salt-minion stop/waiting
salt-minion start/running, process 7465
vos
Posts: 5
Joined: Wed Mar 30, 2016 12:33 pm

Re: Event handler troubles

Post by vos »

Figured it out :roll: I was not using max attempts properly with the script, I needed to adjust to max attempts in generic service to 4 so it sent the SOFT 3 and HARD 4. I swear I tried that earlier but I guess not. Hopefully this might help someone later on.
Locked