I found a guide on how to restart a service via an event handler (https://answerhub.nagios.com/support/s/ ... A-f4a44019) but the problem is that we already have a service check that is monitoring 5 critical services at once. So rather than having 5 different service checks, we combined them into one less network traffic, less service checks to maintain if there is something to change and it just seems cleaner.
The question/problem is how to create the event handler? Will it correctly capture the individual service that is failing/not-running in the check? Currently, if one of the services is not running, it does return CRITICAL can calls out the failed service. But every other service check is still in that returned value. I don't know if this will be a problem or not so I figured I'd ask here first.
Restart Service
- jmichaelson
- Posts: 383
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Restart Service
Hi Angelo, From the doc that you linked to:
You can now test the script works by executing the following command:
/usr/local/nagios/libexec/restart_service.sh CRITICAL 10.25.14.3 Str0ngT0k3n spooler
When the script is run, it receives three arguments which are referenced as $1, $2, $3, $4 in the script.
$1 = The state of the service.
$2 = The host address of the Windows server.
$3 = The NCPA Token on the Windows server.
$4 = The name of the service being restarted.
You can see from the script above that it's only when the service is in a CRITICAL state that the restart_service.sh command will be executed.
As long as you can get the name of the service (which it sounds like you can), and can get it in to this particular Event handler, you should be good. What I'm not so sure of is how you'd be able to get it into the variables (as shown in the Manage Free Variables section of the document), since that is a static string (spooler in the case of the document here).
If you go farther down the restart services document there's a reference to another doc: https://assets.nagios.com/downloads/nag ... tvars.html, which may be of value for what you're trying to do.
If you provide details of how you're checking multiple windows services in a single check we might be able to provide better guidance!
You can now test the script works by executing the following command:
/usr/local/nagios/libexec/restart_service.sh CRITICAL 10.25.14.3 Str0ngT0k3n spooler
When the script is run, it receives three arguments which are referenced as $1, $2, $3, $4 in the script.
$1 = The state of the service.
$2 = The host address of the Windows server.
$3 = The NCPA Token on the Windows server.
$4 = The name of the service being restarted.
You can see from the script above that it's only when the service is in a CRITICAL state that the restart_service.sh command will be executed.
As long as you can get the name of the service (which it sounds like you can), and can get it in to this particular Event handler, you should be good. What I'm not so sure of is how you'd be able to get it into the variables (as shown in the Manage Free Variables section of the document), since that is a static string (spooler in the case of the document here).
If you go farther down the restart services document there's a reference to another doc: https://assets.nagios.com/downloads/nag ... tvars.html, which may be of value for what you're trying to do.
If you provide details of how you're checking multiple windows services in a single check we might be able to provide better guidance!
Please let us know if you have any other questions or concerns.
-Jason
-Jason
-
DoubleDoubleA
- Posts: 286
- Joined: Thu Feb 09, 2017 5:07 pm
Re: Restart Service
Hi @AngleoMileto,
As you already see, the challenging part of the event handler you are attempting is that you're checking multiple things in one check, so your event handler has to figure out which one is broken.
Presuming your check output explicitly says which service is broken, I would consider using the $SERVICEOUTPUT$ or $LONGSERVICEOUTPUT$ macro in your event handler command definition and then parse out exactly which service is being referenced, and then restart that.
Then you would have a $4 switch case block in your bash event handler script to sort out which service needs to be restarted, and then restart that.
https://assets.nagios.com/downloads/nag ... olist.html
https://assets.nagios.com/downloads/nag ... dlers.html
Let us know how it goes!
Aaron
As you already see, the challenging part of the event handler you are attempting is that you're checking multiple things in one check, so your event handler has to figure out which one is broken.
Presuming your check output explicitly says which service is broken, I would consider using the $SERVICEOUTPUT$ or $LONGSERVICEOUTPUT$ macro in your event handler command definition and then parse out exactly which service is being referenced, and then restart that.
Code: Select all
define command{
command_name restart-one-of-many
command_line /usr/local/nagios/libexec/eventhandlers/restart-one-of-many $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $SERVICEOUTPUT$
}https://assets.nagios.com/downloads/nag ... olist.html
https://assets.nagios.com/downloads/nag ... dlers.html
Let us know how it goes!
Aaron
-
AngeloMileto
- Posts: 70
- Joined: Mon Mar 21, 2022 7:53 am
Re: Restart Service
I appreciate the replies and to simplify things, I took one of the services and made a separate check and hooked the event script to that.
So I have a simple service check: check_xi_ncpa!--timeout=45 --token=a really good token --port=5693 --metric='services' --qqueryargs='service=svcname,status=running'!!!!!!!
Event Handler: Service Restart - Windows
command_name Service Restart - Windows
command_line $USER1$/restart_service.sh $SERVICESTATE$ $HOSTADDRESS$ inserttokenhere $_SERVICESERVICE$
I can then issue a passive check result to force the event handler to trigger but I can't get the result back - service did or did not restart. So there is no indication, via the XI web UI, that the event was triggered and if it was, was it successful or not.
As for the command I'm using to get multiple services at once:
What would show up is that either "OK: Service1 is running, Service2 is running, Service3 is running" or "CRITICAL: Service2 is stopped (should be running), Service1 is running, Service3 is running".
So you see, it's really good at reporting even a single of the multiple services state=not-running then the entire check is critical and it knows which one has failed.
For now though, if I can get the single service working with the ability to return/report on the status of the event handler that would be a good start.
So I have a simple service check: check_xi_ncpa!--timeout=45 --token=a really good token --port=5693 --metric='services' --qqueryargs='service=svcname,status=running'!!!!!!!
Event Handler: Service Restart - Windows
command_name Service Restart - Windows
command_line $USER1$/restart_service.sh $SERVICESTATE$ $HOSTADDRESS$ inserttokenhere $_SERVICESERVICE$
Code: Select all
libexec/restart_service.sh
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
/usr/local/nagios/libexec/check_ncpa.py --hostname='$2' --timeout=45 --token='$3' --port=5693 --metric='plugins/restart_service.ps1' --queryargs='$4')
echo "restart_service.ps1" > /tmp/restart_test.txt
;;
esac
exit 0
Code: Select all
restart_service.ps1
param([String]$SvcName='NO_SVC')
Set-StrictMode -version 3
$SvcStatus=((Get-Service -Name $SvcName).Status
if ($SvcStatus -eq "Stopped")
{
Start-Service -Name $SvcName -PassThru -ErrorAction SilentlyContinue
}
if ((Get-Service - Name $SvcName).Status -ne "Running")
{
echo "Service failed to restart!"
exit 1
}
elif
{
echo "Service was restarted."
exit 0
}
I can then issue a passive check result to force the event handler to trigger but I can't get the result back - service did or did not restart. So there is no indication, via the XI web UI, that the event was triggered and if it was, was it successful or not.
As for the command I'm using to get multiple services at once:
Code: Select all
check_command: check_xi_ncpa.py --timeout=45 --token=reallycooltoken --port=5693 --metric='services' --queryargs='service=Service1,service=Service2,service=Service3,status=running'!!!!!!!So you see, it's really good at reporting even a single of the multiple services state=not-running then the entire check is critical and it knows which one has failed.
For now though, if I can get the single service working with the ability to return/report on the status of the event handler that would be a good start.