Page 1 of 1

Event Handler problem

Posted: Thu Mar 28, 2019 2:42 pm
by gixxx11
I'm trying to configure the event handler to trigger on an HTTP service check with a result of warning or critical.

I've set "event_handler_DisableLoadBalancer" as the event handler for the service, and set it to "on".

The content of the event_handler_DisableLoadBalancer is "$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$"

The contents of the event_handler_DisableLoadBalancer.sh is:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer on testoe2016-1.iss.inter-state.com
# What state is the HTTP service in?
case "$1" in
WARNING)
/usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler
exit 0
The command inside event_handler_DisableLoadBalancer.sh "/usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler" works exactly as I want it to when pasted into the terminal on my Nagios.

As you can see the code in event_handler_DisableLoadBalancer.sh references the specific machine (testoe2016-1.iss.inter-state.com) I want to trigger. I actually would prefer this be more generic and actually reference the hostname of the service so I use this script on any of my many hosts. But since I can't get this simple version to work I'm starting here.

Thanks for the assistance.

Re: Event Handler problem

Posted: Thu Mar 28, 2019 2:51 pm
by scottwilkerson
You have some syntax errors in your shell script..

Try this

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer on testoe2016-1.iss.inter-state.com
# What state is the HTTP service in?
case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
If you change your command to

Code: Select all

$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
you could do something like this

Code: Select all

[code]#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
[/code]

Re: Event Handler problem

Posted: Thu Mar 28, 2019 3:27 pm
by gixxx11
Ok so I used your suggesting and set the code to:

Code: Select all

$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
And set the shell script to:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
And still nothing. I'm stopping the website manually and then forcing a check in nagios. The service goes from UP to WARNING and then nothing.

I took screenshots of each just in case it's useful:
https://www.dropbox.com/sh/ilibjcrslbk1 ... ByHza?dl=0

Re: Event Handler problem

Posted: Thu Mar 28, 2019 3:57 pm
by scottwilkerson
It's hard to tell from the screenshot but is their a " after $HOSTNAME$ ?

Also, can you run the test again aftrung running the following command to see if the event handler is being triggered and if there are any errors?

Code: Select all

tail -f /usr/local/nagios/var/nagios.log
You may also want to try running the command manually from the CLI

Code: Select all

/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh WARNING SOFT 1 "testoe2016-1.iss.inter-state.com"

Re: Event Handler problem

Posted: Thu Mar 28, 2019 4:19 pm
by gixxx11
The nagios log says:

Code: Select all

[1553807770] wproc:   stderr line 01: execvp(/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh, ...) failed. errno is 13: Permission denied
When I manually run the command I get:

Code: Select all

-bash: /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: Permission denied
So, shot in the dark, I'm guessing it's a permissions issue...

How do I fix it? Thank you!

Re: Event Handler problem

Posted: Thu Mar 28, 2019 4:27 pm
by scottwilkerson

Code: Select all

chmod +x /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh
One caveat I will mention when using the version with the hostname in there is that your hostname's configured in nagios MUST be the actual hostname you want the script to use

Re: Event Handler problem

Posted: Thu Mar 28, 2019 4:34 pm
by gixxx11
Thanks for the head's up on the hostnames.

After the permissions change I got this:

Code: Select all

/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: line 10: syntax error near unexpected token `)'
/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: line 10: `  CRITICAL)'
This is the current state of that code:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0

Re: Event Handler problem

Posted: Thu Mar 28, 2019 4:50 pm
by ssax
Try this:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host="$4"
case "$1" in
  WARNING)
	;&
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
Test:

Code: Select all

./scriptname.sh WARNING blah blah localhost
./scriptname.sh CRITICAL blah blah localhost

Re: Event Handler problem

Posted: Thu Mar 28, 2019 4:56 pm
by gixxx11
Perfection!

Thank you so very much. Everything is working like I want (for this one specific service at least). I tried critical and warning, both soft and hard and they all worked.

Thank you so very much!

Re: Event Handler problem

Posted: Thu Mar 28, 2019 5:01 pm
by ssax
Or, you could do like this:

Code: Select all

#!/bin/sh
#
# Event handler script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
#       retried 3 times (in a "soft" state) or if the web service somehow
#       manages to fall into a "hard" error state.
#

# $USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
# /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh WARNING SOFT 1 "testoe2016-1.iss.inter-state.com"
#
STATE="$1"
STATETYPE="$2"
ATTEMPT="$3"
HOST="$4"

# What state is the HTTP service in?
case "$STATE" in
	OK)
		# The service just came back up, so don't do anything...
		;;
	UNKNOWN)
		# We don't know what might be causing an unknown error, so don't do anything...
		;;
	WARNING)
		# Warning, restart...
		;&
	CRITICAL)
		# Aha! The HTTP service appears to have a problem - perhaps we should restart the server...
		# Is this a "soft" or a "hard" state?
		case "$STATETYPE" in

		# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
		# check before it turns into a "hard" state and contacts get notified...
		SOFT)
			# What check attempt are we on?  We don't want to restart the web server on the first
			# check, because it may just be a fluke!
			case "$ATTEMPT" in

			# Wait until the check has been tried 3 times before restarting the web server.
			# If the check fails on the 4th time (after we restart the web server), the state
			# type will turn to "hard" and contacts will be notified of the problem.
			# Hopefully this will restart the web server successfully, so the 4th check will
			# result in a "soft" recovery.  If that happens no one gets notified because we
			# fixed the problem!
			3)
				echo -n "Restarting HTTP service (3rd soft critical state)..."
				# Call the init script to restart the HTTPD server
				/etc/rc.d/init.d/httpd restart
				;;
				esac
			;;

		# The HTTP service somehow managed to turn into a hard error without getting fixed.
		# It should have been restarted by the code above, but for some reason it didn't.
		# Let's give it one last try, shall we?  
		# Note: Contacts have already been notified of a problem with the service at this
		# point (unless you disabled notifications for this service)
		HARD)
			echo -n "Restarting service..."
			# Retart it through NRPE
			/usr/local/nagios/libexec/check_nrpe -H $HOST -p 5666 -c DisableLoadBalancer -a spooler
			;;
		esac
		;;
esac

exit 0