Page 2 of 5
Re: NRPE: Automatic restart of multiple services
Posted: Fri Jul 10, 2015 9:33 am
by mhixson2
ssax wrote:Please post the full check (and the output) that you are running initially to check the services so I can see if it has what we need.
service/check definition:
Code: Select all
define service {
host_name [hostname]
service_description #TEST restart dead service
use #standard-service
check_command check_nrpe!check_service!service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc!'crit=not state_is_ok()'!!!!!
event_handler restart_service
event_handler_enabled 1
_SERVICE "Citrix Encryption Service",CitrixCseEngine,IMAService,MFCom,cpsvc,IMAAdvanceSrv,RadeHlprSvc,RadeSvc
register 1
}
command run to test check:
Code: Select all
./check_nrpe -H [hostname] -t 30 -c check_service -a service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc 'crit=not state_is_ok()'
output:
Let me know if you need more.
Thanks!
Re: NRPE: Automatic restart of multiple services
Posted: Fri Jul 10, 2015 12:45 pm
by ssax
What is the output when one of the services is stopped? What about this command?
You could try with the spooler service if you want something to test
Code: Select all
./check_nrpe -H [hostname] -t 30 -c check_service -a 'Citrix Encryption Service' CitrixCseEngine IMAService MFCom cpsvc IMAAdvanceSrv RadeHlprSvc
Re: NRPE: Automatic restart of multiple services
Posted: Mon Jul 13, 2015 9:48 am
by mhixson2
ssax wrote:What is the output when one of the services is stopped?
./check_nrpe -H [hostname] -t 30 -c check_service -a service=Spooler 'crit=not state_is_ok()' returns:
CRITICAL: Spooler=stopped (auto), delayed ()|
What about this command?
Code: Select all
./check_nrpe -H [hostname] -t 30 -c check_service -a 'Citrix Encryption Service' CitrixCseEngine IMAService MFCom cpsvc IMAAdvanceSrv RadeHlprSvc
returned:
CHECK_NRPE: Invalid packet type received from server.
Thanks
Re: NRPE: Automatic restart of multiple services
Posted: Mon Jul 13, 2015 1:34 pm
by ssax
What client (and version) are you using the remote host?
Re: NRPE: Automatic restart of multiple services
Posted: Mon Jul 13, 2015 1:42 pm
by mhixson2
ssax wrote:What client (and version) are you using the remote host?
It's a Windows host using NSClient++ (NSCP-0.4.3.143-x64.msi).
Re: NRPE: Automatic restart of multiple services
Posted: Mon Jul 13, 2015 5:08 pm
by ssax
Try this, it doesn't even use the _SERVICE variable, it works on the check_nrpe output.
Change your command to:
Code: Select all
$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'
Then edit your servicerestart.sh script and change it to:
Code: Select all
#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
crittext='CRITICAL: '
autodeltext=' (auto), delayed ()'
autotext=' (auto)'
stoppedtext='=stopped'
stripped=${3//$crittext/}
stripped=${stripped//$autodeltext/}
stripped=${stripped//$autotext/}
stripped=${stripped//$stoppedtext/}
stripped=${stripped//, /,}
IFS=',' read -a array <<< "$stripped"
pattern='\ '
for ((i=0; i<${#array[@]}; i++));
do
if [[ ${array[$i]} =~ $pattern ]]; then
array[$i]="'${array[$i]}'"
fi
done
services=$(IFS=,; echo "${array[*]}")
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$services"
;;
esac
exit 0
Then modify your windows bat file and change it to:
Code: Select all
@echo off
SET SERVICES=%1
SET SERVICESM=%SERVICES:'=%
:: Loop through services and restart them
:LOOP
FOR /F "tokens=1,* delims=," %%F IN (%SERVICESM%) DO (
net stop "%%F"
net start "%%F"
SET SERVICESM="%%G"
GOTO LOOP
)
@exit 0
Let me know if that works for you.
Re: NRPE: Automatic restart of multiple services
Posted: Tue Jul 14, 2015 10:31 am
by mhixson2
Hmm.. no, not working at all.
I changed the name of the shell script on my end a while ago, so I updated it in the command to
$USER1$/restart_service.sh $SERVICESTATE$ $HOSTADDRESS$ $SERVICEOUTPUT$ to reflect the name change.
The command name is restart_service as well, so I changed
runcmd in the shell script you provided to
restart_service.
I also simplified things for this test again, so the service is only monitoring Spooler. When stopped, the service does not auto-restart.
However, when running
./check_nrpe -H [hostname] -p 5666 -c restart_service -a Spooler manually on the Nagios server, the service restarts successfully:
Code: Select all
The Print Spooler service is stopping.
The Print Spooler service was stopped successfully.
The Print Spooler service is starting.
The Print Spooler service was started successfully.|
Thanks!
Re: NRPE: Automatic restart of multiple services
Posted: Wed Jul 15, 2015 9:46 am
by jdalrymple
Looks like the script isn't accounting for the pipe for perfdata. Try this:
Code: Select all
#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
crittext='CRITICAL: '
autodeltext=' (auto), delayed ()'
autotext=' (auto)'
stoppedtext='=stopped'
perfpipetext='|'
stripped=${3//$crittext/}
stripped=${stripped//$autodeltext/}
stripped=${stripped//$autotext/}
stripped=${stripped//$stoppedtext/}
stripped=${stripped//$perfpipetext/}
stripped=${stripped//, /,}
IFS=',' read -a array <<< "$stripped"
pattern='\ '
for ((i=0; i<${#array[@]}; i++));
do
if [[ ${array[$i]} =~ $pattern ]]; then
array[$i]="'${array[$i]}'"
fi
done
services=$(IFS=,; echo "${array[*]}")
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$services"
;;
esac
exit 0
Also, try it with multiple stopped services, I have a feeling....
Re: NRPE: Automatic restart of multiple services
Posted: Wed Jul 15, 2015 3:18 pm
by mhixson2
Unfortunately it's still not working. Here are the details of my config in case I missed something along the way.
Service:
Code: Select all
host_name [hostname]
service_description #TEST restart dead service
use #standard-service
check_command check_nrpe!check_service!service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc!'crit=not state_is_ok()'!!!!!
event_handler restart_service
event_handler_enabled 1
_SERVICE "Citrix Encryption Service",CitrixCseEngine,IMAService,MFCom,cpsvc,IMAAdvanceSrv,RadeHlprSvc,RadeSvc
register 1
Batch: (location on host Windows server: C:\Program Files\NSClient++\scripts)
Code: Select all
@echo off
SET SERVICES=%1
SET SERVICESM=%SERVICES:'=%
:: Loop through services and restart them
:LOOP
FOR /F "tokens=1,* delims=," %%F IN (%SERVICESM%) DO (
net stop "%%F"
net start "%%F"
SET SERVICESM="%%G"
GOTO LOOP
)
@exit 0
Shell script (location on Nagios server: /usr/local/nagios/libexec)
Code: Select all
#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
crittext='CRITICAL: '
autodeltext=' (auto), delayed ()'
autotext=' (auto)'
stoppedtext='=stopped'
perfpipetext='|'
stripped=${3//$crittext/}
stripped=${stripped//$autodeltext/}
stripped=${stripped//$autotext/}
stripped=${stripped//$stoppedtext/}
stripped=${stripped//$perfpipetext/}
stripped=${stripped//, /,}
IFS=',' read -a array <<< "$stripped"
pattern='\ '
for ((i=0; i<${#array[@]}; i++));
do
if [[ ${array[$i]} =~ $pattern ]]; then
array[$i]="'${array[$i]}'"
fi
done
services=$(IFS=,; echo "${array[*]}")
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a "$services"
;;
esac
exit 0
nsclient.ini
Code: Select all
[/settings/default]
;A comma separated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts=[Nagios server IP]
[/modules]
;Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem=1
;Various file and disk related things.
CheckDisk=1
;Listens for incoming NRPE connection and processes incoming requests.
NRPEServer=1
;Execute external scripts
CheckExternalScripts=enabled
[/settings/NRPE/server]
allowed ciphers=ADH
;Allow characters in command definitions
allow nasty characters=1
;Allow -a arguments in command definitions
allow arguments=1
[/settings/external scripts]
;Allow arguments to be passed to external scripts
allow arguments=1
[/settings/external scripts/scripts]
;Reboot machine event handler
reboot_machine=scripts\reboot_machine.ps1
;Restart service event handler
restart_service=scripts\restart_service.bat "$ARG1$"
[/settings/log]
;Enable debug level logging
;file name = nsclient.log
;level = debug
Additionally, this test is successful. So would that point to the event handler not doing what it's supposed to if the check and restart_service script are working properly?
Code: Select all
./check_nrpe -H [hostname] -p 5666 -t 30 -c restart_service -a MFCom,IMAService,'Citrix Encryption Service'
The Citrix MFCOM Service service is stopping.
The Citrix MFCOM Service service was stopped successfully.
The Citrix MFCOM Service service is starting.
The Citrix MFCOM Service service was started successfully.
The Citrix Independent Management Architecture service is stopping.
The Citrix Independent Management Architecture service was stopped successfully.
The Citrix Independent Management Architecture service is starting.
The Citrix Independent Management Architecture service was started successfully.
The Citrix Encryption Service service is stopping.
The Citrix Encryption Service service was stopped successfully.
The Citrix Encryption Service service is starting.
The Citrix Encryption Service service was started successfully.|
Re: NRPE: Automatic restart of multiple services
Posted: Wed Jul 15, 2015 3:25 pm
by jdalrymple
Did you try it with only 1 failed service for sure? We have to crawl before we can walk.