Event Handler problem

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
gixxx11
Posts: 16
Joined: Mon Apr 17, 2017 8:54 am

Event Handler problem

Post by gixxx11 »

I'm trying to configure the event handler to trigger on an HTTP service check with a result of warning or critical.

I've set "event_handler_DisableLoadBalancer" as the event handler for the service, and set it to "on".

The content of the event_handler_DisableLoadBalancer is "$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$"

The contents of the event_handler_DisableLoadBalancer.sh is:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer on testoe2016-1.iss.inter-state.com
# What state is the HTTP service in?
case "$1" in
WARNING)
/usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler
exit 0
The command inside event_handler_DisableLoadBalancer.sh "/usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler" works exactly as I want it to when pasted into the terminal on my Nagios.

As you can see the code in event_handler_DisableLoadBalancer.sh references the specific machine (testoe2016-1.iss.inter-state.com) I want to trigger. I actually would prefer this be more generic and actually reference the hostname of the service so I use this script on any of my many hosts. But since I can't get this simple version to work I'm starting here.

Thanks for the assistance.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler problem

Post by scottwilkerson »

You have some syntax errors in your shell script..

Try this

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer on testoe2016-1.iss.inter-state.com
# What state is the HTTP service in?
case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H testoe2016-1.iss.inter-state.com -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
If you change your command to

Code: Select all

$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
you could do something like this

Code: Select all

[code]#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
[/code]
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
gixxx11
Posts: 16
Joined: Mon Apr 17, 2017 8:54 am

Re: Event Handler problem

Post by gixxx11 »

Ok so I used your suggesting and set the code to:

Code: Select all

$USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
And set the shell script to:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
And still nothing. I'm stopping the website manually and then forcing a check in nagios. The service goes from UP to WARNING and then nothing.

I took screenshots of each just in case it's useful:
https://www.dropbox.com/sh/ilibjcrslbk1 ... ByHza?dl=0
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler problem

Post by scottwilkerson »

It's hard to tell from the screenshot but is their a " after $HOSTNAME$ ?

Also, can you run the test again aftrung running the following command to see if the event handler is being triggered and if there are any errors?

Code: Select all

tail -f /usr/local/nagios/var/nagios.log
You may also want to try running the command manually from the CLI

Code: Select all

/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh WARNING SOFT 1 "testoe2016-1.iss.inter-state.com"
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
gixxx11
Posts: 16
Joined: Mon Apr 17, 2017 8:54 am

Re: Event Handler problem

Post by gixxx11 »

The nagios log says:

Code: Select all

[1553807770] wproc:   stderr line 01: execvp(/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh, ...) failed. errno is 13: Permission denied
When I manually run the command I get:

Code: Select all

-bash: /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: Permission denied
So, shot in the dark, I'm guessing it's a permissions issue...

How do I fix it? Thank you!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler problem

Post by scottwilkerson »

Code: Select all

chmod +x /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh
One caveat I will mention when using the version with the hostname in there is that your hostname's configured in nagios MUST be the actual hostname you want the script to use
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
gixxx11
Posts: 16
Joined: Mon Apr 17, 2017 8:54 am

Re: Event Handler problem

Post by gixxx11 »

Thanks for the head's up on the hostnames.

After the permissions change I got this:

Code: Select all

/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: line 10: syntax error near unexpected token `)'
/usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh: line 10: `  CRITICAL)'
This is the current state of that code:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host=$4

case "$1" in
  WARNING)
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
exit 0
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Event Handler problem

Post by ssax »

Try this:

Code: Select all

#!/bin/sh
#
# Event handler script for DisableLoadBalancer
# What state is the HTTP service in?

host="$4"
case "$1" in
  WARNING)
	;&
  CRITICAL)
    /usr/local/nagios/libexec/check_nrpe -H $host -p 5666 -c DisableLoadBalancer -a spooler
    ;;
esac
Test:

Code: Select all

./scriptname.sh WARNING blah blah localhost
./scriptname.sh CRITICAL blah blah localhost
gixxx11
Posts: 16
Joined: Mon Apr 17, 2017 8:54 am

Re: Event Handler problem

Post by gixxx11 »

Perfection!

Thank you so very much. Everything is working like I want (for this one specific service at least). I tried critical and warning, both soft and hard and they all worked.

Thank you so very much!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Event Handler problem

Post by ssax »

Or, you could do like this:

Code: Select all

#!/bin/sh
#
# Event handler script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
#       retried 3 times (in a "soft" state) or if the web service somehow
#       manages to fall into a "hard" error state.
#

# $USER1$/event_handler_DisableLoadBalancer.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ "$HOSTNAME$"
# /usr/local/nagios/libexec/event_handler_DisableLoadBalancer.sh WARNING SOFT 1 "testoe2016-1.iss.inter-state.com"
#
STATE="$1"
STATETYPE="$2"
ATTEMPT="$3"
HOST="$4"

# What state is the HTTP service in?
case "$STATE" in
	OK)
		# The service just came back up, so don't do anything...
		;;
	UNKNOWN)
		# We don't know what might be causing an unknown error, so don't do anything...
		;;
	WARNING)
		# Warning, restart...
		;&
	CRITICAL)
		# Aha! The HTTP service appears to have a problem - perhaps we should restart the server...
		# Is this a "soft" or a "hard" state?
		case "$STATETYPE" in

		# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
		# check before it turns into a "hard" state and contacts get notified...
		SOFT)
			# What check attempt are we on?  We don't want to restart the web server on the first
			# check, because it may just be a fluke!
			case "$ATTEMPT" in

			# Wait until the check has been tried 3 times before restarting the web server.
			# If the check fails on the 4th time (after we restart the web server), the state
			# type will turn to "hard" and contacts will be notified of the problem.
			# Hopefully this will restart the web server successfully, so the 4th check will
			# result in a "soft" recovery.  If that happens no one gets notified because we
			# fixed the problem!
			3)
				echo -n "Restarting HTTP service (3rd soft critical state)..."
				# Call the init script to restart the HTTPD server
				/etc/rc.d/init.d/httpd restart
				;;
				esac
			;;

		# The HTTP service somehow managed to turn into a hard error without getting fixed.
		# It should have been restarted by the code above, but for some reason it didn't.
		# Let's give it one last try, shall we?  
		# Note: Contacts have already been notified of a problem with the service at this
		# point (unless you disabled notifications for this service)
		HARD)
			echo -n "Restarting service..."
			# Retart it through NRPE
			/usr/local/nagios/libexec/check_nrpe -H $HOST -p 5666 -c DisableLoadBalancer -a spooler
			;;
		esac
		;;
esac

exit 0
Locked