Need help with eventhandlers to restart services

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
TheWillofBlade
Posts: 2
Joined: Thu Feb 11, 2016 11:49 am

Need help with eventhandlers to restart services

Post by TheWillofBlade »

I'm getting started with Nagios and I want to learn some specific and interesting things to expose them on class such as how event handlers works and implementing one or a few examples about it. More specifically, I thought about try to automatically restart services when they're not working fine/change to HARD state on my monitorized Windows 7 host or even after on my Debian 7.8 host if everything goes fine.

The official Nagios Documentation has a PDF here about it but they just only show you to do it with the Nagios XI Web Interface and I just have installed Nagios Core and I can't follow most of the steps. In an attempt to frustration, I tried to install Nagios XI but I canceled it because a Warning appeared showing that that type of installation was only for CentOS/Red Hat and it may cause troubles if Nagios Core was already installed.

Besides of that, the official Nagios Event Handlers documentation doesn't help me because the example restarting the HTTP service is not complete and there aren't any external pages of examples which can help me because most of them are really old and/or I don't even understand what sometimes they're doing.

I'll be very greatful If someone of you can show me a full example about how to implement the restart of a service using a Nagios Event Handler on a Windows host monitorized using Nagios Core. I specially don't understand in which file I should set up the event handler and which commands I need to use, I just worked a bit with check_nrpe, check_snmp, check_nt and commands who allows me to show the CPU Load, Memory Usage, etc

I'm running Nagios Core 4.1.1 on a Debian 7.8 VirtualBox machine with Nagios Plugins, SNMP and NRPE monitorizing a remote Windows 7 host with NSClient++ 0.4.4.15 (NRPE and SNMP installed & enabled).
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Need help with eventhandlers to restart services

Post by rkennedy »

I'm going to run through the instructions that you posted, https://assets.nagios.com/downloads/nag ... ios-XI.pdf - but, dictated more towards Core. Some of the things will be the same. The official documentation will work for XI / Core, and if you get lost refer back to it.
Create A Batch File For The Check
Open your favorite text editor and paste in the following code:
@echo off
net stop %1
net start %1
@exit 0
Or, download it using this link:
http://assets.nagios.com/downloads/nagi ... runcmd.bat
Once completed, save it as a batch file runcmd.bat in your NSClient++'s scripts directory, usually
c:\program files\NSClient++\scripts
OK - we've created a script for NSClient++ to execute. Time to add it to the NSClient++ configuration.
Add the following string to the list of External Scripts:
runcmd=scripts\runcmd.bat "$ARG1$"
Also, verify that
allow_arguments=1
. If this variable is not set to 1, you will not be able to pass arguments to your scripts, then
save
the .ini file.
Now, we've added the configuration so that NSClient++ can pick up on runcmd.bat. Now navigate to your services.msc, and restart the NSClient++ service (nscp if you prefer to do it over cmd). Let's test this prior to adding it to an event handler.
cd /usr/local/nagios/libexec
./check_nrpe -H <Window Host IP Address> -p 5666 -c runcmd -a spooler
Run the above commands on your Nagios machine, and it should restart the spooler service.

Now, we need to create a bash script called servicerestart.sh located in the /usr/local/nagios/libexec/ directory.

Code: Select all

#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
        OK)    
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)      
 
                /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$3"
        ;;
esac
               
exit 0
Change the permissions so that it will work with Nagios -

Code: Select all

chown nagios:nagios /usr/local/nagios/libexec/servicerestart.sh
chmod 775 /usr/local/nagios/libexec/servicerestart.sh
Now, modify your commands.cfg and add this -$USER1$/servicerestart.sh $SERVICESTATE$
$HOSTADDRESS$ $_SERVICESERVICE$

Code: Select all

# 'service_restart' command definition
define command{
        command_name    service_restart
        command_line    $USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICESERVICE$
        }
Lastly, edit your service definition, and add this part to it -

Code: Select all

event_handler service_restart
Restart Nagios, and it should be working. Let me know if you run into any issues.
Former Nagios Employee
TheWillofBlade
Posts: 2
Joined: Thu Feb 11, 2016 11:49 am

Re: Need help with eventhandlers to restart services

Post by TheWillofBlade »

rkennedy wrote:I'm going to run through the instructions that you posted, https://assets.nagios.com/downloads/nag ... ios-XI.pdf - but, dictated more towards Core. Some of the things will be the same. The official documentation will work for XI / Core, and if you get lost refer back to it.
Create A Batch File For The Check
Open your favorite text editor and paste in the following code:
@echo off
net stop %1
net start %1
@exit 0
Or, download it using this link:
http://assets.nagios.com/downloads/nagi ... runcmd.bat
Once completed, save it as a batch file runcmd.bat in your NSClient++'s scripts directory, usually
c:\program files\NSClient++\scripts
OK - we've created a script for NSClient++ to execute. Time to add it to the NSClient++ configuration.
Add the following string to the list of External Scripts:
runcmd=scripts\runcmd.bat "$ARG1$"
Also, verify that
allow_arguments=1
. If this variable is not set to 1, you will not be able to pass arguments to your scripts, then
save
the .ini file.
Now, we've added the configuration so that NSClient++ can pick up on runcmd.bat. Now navigate to your services.msc, and restart the NSClient++ service (nscp if you prefer to do it over cmd). Let's test this prior to adding it to an event handler.
cd /usr/local/nagios/libexec
./check_nrpe -H <Window Host IP Address> -p 5666 -c runcmd -a spooler
Run the above commands on your Nagios machine, and it should restart the spooler service.

Now, we need to create a bash script called servicerestart.sh located in the /usr/local/nagios/libexec/ directory.

Code: Select all

#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
        OK)    
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)      
 
                /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$3"
        ;;
esac
               
exit 0
Change the permissions so that it will work with Nagios -

Code: Select all

chown nagios:nagios /usr/local/nagios/libexec/servicerestart.sh
chmod 775 /usr/local/nagios/libexec/servicerestart.sh
Now, modify your commands.cfg and add this -$USER1$/servicerestart.sh $SERVICESTATE$
$HOSTADDRESS$ $_SERVICESERVICE$

Code: Select all

# 'service_restart' command definition
define command{
        command_name    service_restart
        command_line    $USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICESERVICE$
        }
Lastly, edit your service definition, and add this part to it -

Code: Select all

event_handler service_restart
Restart Nagios, and it should be working. Let me know if you run into any issues.
Thanks, really appreciated. I didn't have any troubles following the steps but at the end of it, when you tell me to edit my service definition i'm not pretty sure about how to set it up and if I should do it as a service or a host (as they did in the Nagios XI PDF). I tried to add a new service definition at the end of my /usr/local/nagios/etc/objects/windows.cfg file with the next config:

Code: Select all

define service{
        use                     generic-service
        host_name               Windows 50
        service_description     Restart Windows Service
        check_command           check-host-alive
        max_check_attempts              4
        normal_check_interval           5
        retry_check_interval            1
        register                        0
        event_handler   service_restart
        }
I think I did something wrong but anways the service doesn't even appear in my host, which I don't really understand why.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Need help with eventhandlers to restart services

Post by rkennedy »

Can you post the result of verifying your config file? /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Former Nagios Employee
andresfvs
Posts: 7
Joined: Wed Apr 20, 2016 10:24 am

Re: Need help with eventhandlers to restart services

Post by andresfvs »

Hi, guys... i have a doubt with this script... it works in my environment, but i cant find the $_SERVICESERVICE$ variable definition... I was looking in this link https://assets.nagios.com/downloads/nag ... rvicestate but i couldn find it... can you tell me where can i find or where must i do the $_SERVICESERVICE$ variable definition?

Thank you!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Need help with eventhandlers to restart services

Post by rkennedy »

$_SERVICESERVICE$ is a custom variable that this uses, it won't be listed on the macro list.

What are you looking to accomplish?
Former Nagios Employee
andresfvs
Posts: 7
Joined: Wed Apr 20, 2016 10:24 am

Re: Need help with eventhandlers to restart services

Post by andresfvs »

I´m trying to understand the language... coding... and the samples... but in the sample, i cant see where the variable is defined... can you tell me where?

I´m really newbie at nagios, and the help is so confused...
andresfvs
Posts: 7
Joined: Wed Apr 20, 2016 10:24 am

Re: Need help with eventhandlers to restart services

Post by andresfvs »

But, think about it, i really need a script that permit identify when a http page return a http 500 error, and restart the service... if the page still show the http 500 error, send an email to the woork team, with the sh script... but i'm thinking about how to do it...
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Need help with eventhandlers to restart services

Post by eloyd »

General approach for doing what you want with event handlers and notification:
  • Let's assume Nagios checks every five minutes to see if you get a 500 error
  • Every check fires the event handler, so you teach it to exit without doing anything if the result is "OK"
  • When there is a 500 error, the event handler knows that this is the 1st SOFT CRITICAL
  • Meanwhile, Nagios starts checking every minute (let's assume) and will check for five total attempts before notifying
This gives your event handler four minutes until Nagios goes into a HARD CRITICAL state. In your event handler logic, you can do things like:
  • Send a command to Nagios to stop checking the service (this is an easy way to prevent checks from checking while your event handler is trying to restart something)
  • Try to restart the server (may require SSH or other remote access if the server is remote)
  • When done (or when it can't fix it) it sends Nagios the command to start doing service checks again
  • Optionally, teach your script to try again on the second SOFT CRITICAL
Now, when Nagios continues checking, it will check a total of five times, it will have tried to restart once (or more) times, and if it cannot, it will finally notify people through normal Nagios notifications if the restart has failed.

HOWEVER, if the restart worked, the next Nagios check will go back to an OK state and no notifications will have been sent out. Optionally, you can send an email from with your script to let people know that it was restarted if you want to. If you try to coordinate all of this from within the script itself, it gets very complex.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Need help with eventhandlers to restart services

Post by ssax »

Thanks eloyd!

andresfvs, let us know if you have any additional questions.


Thank you
Locked