Page 1 of 1

service restart event handler repetition.

Posted: Wed Sep 17, 2014 6:43 pm
by c.slagel
So we are using the service restart event handler that Nagios provided their wonderful guide for. Most of the time it works awesome, but sometimes the services are so badly messed up that the restart doesn't fix the issue and we have to go in and manually restart them again ourselves.

Is there an easy way to get the event handler to automatically trigger a second time if the service doesn't come back after the first try/a certain amount of time?

Thanks!

Re: service restart event handler repetition.

Posted: Thu Sep 18, 2014 9:24 am
by slansing
Man... it's been a while since I wrote that guide, I'm glad people still find it useful! You could probably trigger it a second time by submitting a passive check result to your service from it's Advanced tab, sent it an OK, then a critical again, and that should do it. The script on either end, does not account for running a second time since global event handlers are only generally triggered on state changes.

Re: service restart event handler repetition.

Posted: Thu Sep 18, 2014 1:09 pm
by c.slagel
Haha for sure, it's wonderful!

So, is there a way to automate that? This way would also require manual intervention which we are trying to avoid.

Re: service restart event handler repetition.

Posted: Thu Sep 18, 2014 2:45 pm
by abrist
Set the check to "is_volatile". This will treat every check as a state change. Next, add some logic to your event handler script to check to see if the state is hard critical (or whatever state and type on which you want to fire the event). Now, every check that is run will fire off the event handler. If all is ok, the extra logic should bail before running the actual/original event (restarting a service). If all is not well, the original event commands will fire. On the next iteration of the check, if the service is not ok, the event will fire again, and so on until the service recovers.
Does that make sense?

Re: service restart event handler repetition.

Posted: Mon Sep 29, 2014 6:13 pm
by c.slagel
OK, so...

The script on the nagios server only triggers event handlers if the service enters a hard state:

Code: Select all

#!/bin/sh
# Event Handler for Restarting Linux/BSD/Windows Services
# Assumes
# $USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICESERVICE$
case "$1" in
        OK)
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)

        if [ "$4" == "HARD" ];then
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$3"
        fi

;;
esac

exit 0
The script on the server restarts the service with a 3 minute wait between stop and start.

The service is set with a retry interval of 4 minutes, with a max check attempt of 2. It is ALSO now set to volatile.

So if I'm understanding what you are saying correctly, if after the second initial check the service is down, the event handler triggers. This is the normal part. but now that it's set to volatile, if after another 4 minutes when it rechecks again, and the event handler restart failed, it will basically send a new hard state and re-trigger the event handler, repeating this every 4 minutes until the service returns an OK?

Re: service restart event handler repetition.

Posted: Tue Sep 30, 2014 9:48 am
by tmcdonald
That sounds about right. Keeping a one-minute buffer between the service start and the next check was a good idea too.