service restart event handler repetition.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
c.slagel
Posts: 57
Joined: Mon Dec 17, 2012 6:47 pm

service restart event handler repetition.

Post by c.slagel »

So we are using the service restart event handler that Nagios provided their wonderful guide for. Most of the time it works awesome, but sometimes the services are so badly messed up that the restart doesn't fix the issue and we have to go in and manually restart them again ourselves.

Is there an easy way to get the event handler to automatically trigger a second time if the service doesn't come back after the first try/a certain amount of time?

Thanks!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: service restart event handler repetition.

Post by slansing »

Man... it's been a while since I wrote that guide, I'm glad people still find it useful! You could probably trigger it a second time by submitting a passive check result to your service from it's Advanced tab, sent it an OK, then a critical again, and that should do it. The script on either end, does not account for running a second time since global event handlers are only generally triggered on state changes.
c.slagel
Posts: 57
Joined: Mon Dec 17, 2012 6:47 pm

Re: service restart event handler repetition.

Post by c.slagel »

Haha for sure, it's wonderful!

So, is there a way to automate that? This way would also require manual intervention which we are trying to avoid.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: service restart event handler repetition.

Post by abrist »

Set the check to "is_volatile". This will treat every check as a state change. Next, add some logic to your event handler script to check to see if the state is hard critical (or whatever state and type on which you want to fire the event). Now, every check that is run will fire off the event handler. If all is ok, the extra logic should bail before running the actual/original event (restarting a service). If all is not well, the original event commands will fire. On the next iteration of the check, if the service is not ok, the event will fire again, and so on until the service recovers.
Does that make sense?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
c.slagel
Posts: 57
Joined: Mon Dec 17, 2012 6:47 pm

Re: service restart event handler repetition.

Post by c.slagel »

OK, so...

The script on the nagios server only triggers event handlers if the service enters a hard state:

Code: Select all

#!/bin/sh
# Event Handler for Restarting Linux/BSD/Windows Services
# Assumes
# $USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICESERVICE$
case "$1" in
        OK)
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)

        if [ "$4" == "HARD" ];then
    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$3"
        fi

;;
esac

exit 0
The script on the server restarts the service with a 3 minute wait between stop and start.

The service is set with a retry interval of 4 minutes, with a max check attempt of 2. It is ALSO now set to volatile.

So if I'm understanding what you are saying correctly, if after the second initial check the service is down, the event handler triggers. This is the normal part. but now that it's set to volatile, if after another 4 minutes when it rechecks again, and the event handler restart failed, it will basically send a new hard state and re-trigger the event handler, repeating this every 4 minutes until the service returns an OK?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: service restart event handler repetition.

Post by tmcdonald »

That sounds about right. Keeping a one-minute buffer between the service start and the next check was a good idea too.
Former Nagios employee
Locked