Page 1 of 2
Event Handler for Restart-Service
Posted: Thu Aug 20, 2020 8:10 am
by kwhogster
Nagios Core 4.3.4
My Service
Code: Select all
define service {
host_name Desktop2
service_description Check All Service
check_command check_nrpe!checkservicestate! -a CheckAll exclude=RemoteRegistry exclude=sppsvc exclude=MapsBroker exclude=dbupdate exclude=DoSvc exclude=TrustedInstller exclude=gpsvc exclude=WbioSrvc exclude=edgeupdate
servicegroups AllServices
check_interval 60
notification_interval 60
event_handler restart_service
use generic-service
}
My command
Code: Select all
###############################################################################
#
# restart_service event handler
#
###############################################################################
define command{
command_name restart_service
command_line /usr/local/nagios/libexec/restart_service $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPTS$
}
My event Handler located in /usr/local/nagios/libexec/
Code: Select all
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a "$3"
;;
esac
exit 0
My nsclient.ini on the client test machine
Code: Select all
[/settings/external scripts/scripts]
; Restart Windows Service
restart_service = scripts\\restart_service.cmd $ARG1$
My restart_service.cmd
Code: Select all
@echo off
net stop %1
net start %1
@exit 0
Example
I stopped the spooler service on the test desktop computer and Nagios alerted me that the service was stopped and critical.
I was expecting the event handler to run the cmd file and start the service but it did not .
Is my code correct ?
First time trying event handlers.
Thank you
Tom
Re: Event Handler for Restart-Service
Posted: Fri Aug 21, 2020 2:59 pm
by kwhogster
Anyone with an idea?
Re: Event Handler for Restart-Service
Posted: Mon Aug 24, 2020 1:47 pm
by benjaminsmith
Hi,
The first place to check woudl be the Nagios Log to verify that the event handler script was called by the nagios service as defined in your configurations.
Code: Select all
tail -n 100 /usr/local/nagios/var/nagios.log
Once you have that verified, then it's likely an issue with the NSClient setup or file permissions.
Re: Event Handler for Restart-Service
Posted: Mon Aug 24, 2020 2:33 pm
by kwhogster
[1598297389] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW002;Check All Service;1598297386
[1598297389] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;1;CRITICAL: Spooler: stopped delayed ()
[1598297389] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;1;restart_service
[1598297389] wproc: SERVICE EVENTHANDLER job 282 from worker Core Worker 12659 is a non-check helper but exited with return code 127
[1598297389] wproc: early_timeout=0; exited_ok=1; wait_status=32512; error_code=0;
[1598297389] wproc: stderr line 01: /bin/sh: 1: /usr/local/nagios/libexec/restart_service: not found
[1598297398] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW002;Check All Service;1598297396
[1598297398] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;2;CRITICAL: Spooler: stopped delayed ()
[1598297398] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;2;restart_service
[1598297398] wproc: SERVICE EVENTHANDLER job 285 from worker Core Worker 12661 is a non-check helper but exited with return code 127
[1598297398] wproc: early_timeout=0; exited_ok=1; wait_status=32512; error_code=0;
[1598297398] wproc: stderr line 01: /bin/sh: 1: /usr/local/nagios/libexec/restart_service: not found
Looks like it is not able to find the script should I change it to
/usr/local/nagios/libexec/restart_service.sh ?????????
Or is it something else
Re: Event Handler for Restart-Service
Posted: Mon Aug 24, 2020 3:42 pm
by kwhogster
OK I added the .sh and now I do not get the error above.
But it still does not run the script see this log
[1598300928] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW002;Check All Service;1598300916
[1598300928] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;1;CRITICAL: Spooler: stopped delayed ()
[1598300928] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;1;restart_service
[1598300933] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;TGKW002;Check All Service;1598300932
[1598300933] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;2;CRITICAL: Spooler: stopped delayed ()
[1598300933] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;2;restart_service
[1598300993] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;3;CRITICAL: Spooler: stopped delayed ()
[1598300993] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;3;restart_service
[1598301054] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;4;CRITICAL: Spooler: stopped delayed ()
[1598301054] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;4;restart_service
[1598301113] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;5;CRITICAL: Spooler: stopped delayed ()
[1598301113] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;5;restart_service
[1598301173] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;6;CRITICAL: Spooler: stopped delayed ()
[1598301173] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;6;restart_service
[1598301233] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;7;CRITICAL: Spooler: stopped delayed ()
[1598301233] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;7;restart_service
[1598301293] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;8;CRITICAL: Spooler: stopped delayed ()
[1598301293] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;8;restart_service
[1598301353] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;SOFT;9;CRITICAL: Spooler: stopped delayed ()
[1598301353] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;SOFT;9;restart_service
[1598301414] SERVICE ALERT: TGKW002;Check All Service;CRITICAL;HARD;10;CRITICAL: Spooler: stopped delayed ()
[1598301414] SERVICE EVENT HANDLER: TGKW002;Check All Service;CRITICAL;HARD;10;restart_service
root@tgcs017:/usr/local/nagios/etc/objects#
Any ideas?
Re: Event Handler for Restart-Service
Posted: Mon Aug 24, 2020 8:00 pm
by kwhogster
When I run the command from the nagios server manually it works
root@tgcs017:/usr/local/nagios/libexec# /usr/local/nagios/libexec/check_nrpe -H "TGKW002" -p 5666 -c restart_service -a "spooler"
The Print Spooler service is stopping.
The Print Spooler service was stopped successfully.
The Print Spooler service is starting.
The Print Spooler service was started successfully.
So I believe the nsclinet.ini is setup correctly and the permissions are ok because as you can see the script runs.
Any ideas?
Re: Event Handler for Restart-Service
Posted: Tue Aug 25, 2020 8:29 am
by kwhogster
I updates the restart_service.sh to this.
Code: Select all
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a "$3"
;;
esac
;;
HARD)
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a "$3"
;;
esac
;;
esac
exit 0
Still no change
any ideas?
Re: Event Handler for Restart-Service
Posted: Tue Aug 25, 2020 3:05 pm
by gormank
Try testing as nagios?
root@tgcs017:/usr/local/na
Re: Event Handler for Restart-Service
Posted: Tue Aug 25, 2020 6:09 pm
by kwhogster
Gormank
Try testing as Nagios? su nagios
It runs as the nagios account
root@tgcs017:/usr/local/nagios/libexec# su nagios
nagios@tgcs017:/usr/local/nagios/libexec$ ls
chardetect check_dhcp check_esxi_hardware.py check_ifstatus check_mrtg check_ntp_time check_rpc check_tcp check_vmware_api.pl.backup2 nm_check_environment utils.pm
check_apt check_dig check_file_age check_imap check_mrtgtraf check_nwstat check_sensors check_time check_wave nm_check_uptime utils.sh
check_breeze check_disk check_flexlm check_ircd check_nagios check_oracle check_smtp check_udp check_win_eventlog.pl nm_check_version
check_by_ssh check_disk_smb check_ftp check_load check_nntp check_overcr check_snmp_cisco.pl check_ups mibcopy.py nm_notify_slack
check_cisco.pl check_dns check_http check_loadmaster.pl check_nrpe check_ping check_snmp_cisco.sh check_uptime mibdump.py patching_downtime.sh
check_clamd check_dummy check_icmp check_log check_nt check_pop check_snmp_synology check_users negate restart_service.sh
check_cluster check_esx3.pl check_ide_smart check_mailq check_ntp check_procs check_ssh check_vmware_api.pl nm_check_admin_up_oper_down ServerAlarmNotify.php
check_dfs.sh check_esx3.pl.backup check_ifoperstatus check_mem.sh check_ntp_peer check_real check_swap check_vmware_api.pl.backup nm_check_asa_connections urlize
nagios@tgcs017:/usr/local/nagios/libexec$ /usr/local/nagios/libexec/check_nrpe -H "TGKW002" -p 5666 -c restart_service -a "spooler"
The Print Spooler service is stopping.
The Print Spooler service was stopped successfully.
The Print Spooler service is starting.
The Print Spooler service was started successfully.
nagios@tgcs017:/usr/local/nagios/libexec$
Re: Event Handler for Restart-Service
Posted: Wed Aug 26, 2020 5:12 pm
by benjaminsmith
Hi,
Since it's not a permissions issue, perhaps the logic in the event handler is not working as expected. You can verify this by creating a simple event handler that directly runs the restart script.