Yeah, here's my test setup:
I have the Nagios XI trial VM installed in Virtualbox running on my desktop. I have all firewalls and AV applications on my host (desktop) turned off and/or disabled so as to limit possible blockages.
I can do the basic check to my desktop from my VM with no issue...
...I can even activate the remote script without issue as well as pass arguments to it...
Code: Select all
./check_nrpe -H host_ip -c runcmd -a Spooler
I then created a script within the VM to use as the trigger of my event handler (code below). This script even works when I pass arguments to it...
Code: Select all
./servicerestart.sh "CRITICAL" 192.168.1.103 Spooler
After all of the manual testing was done I figured I was ready to test the application's ability to trigger an event handler, so I created a service monitor. Once I had it monitoring Spooler and received data showing that it was up/down properly, I moved back to CCM to add the event handler information. I made a command that read as such:
Code: Select all
$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICE$
..set it as a misc command, made it Active, and saved the operation. Once back on the commands screen, I applied the configuration to be safe. I then moved to Services and began editing the previous monitor I created for the Print Spooler, selecting my newly created Service Restart event handler as it's event handler, turning them on, and moving to create a variable definition of _SERVICE = Spooler. Save all of that, apply the config and move to testing.
At this point I manually stop the Spooler service on my desktop, and let Nagios find out on it's own. It waits, it checks, it detects that it's down, but no event handler launches. I let it go through tries 1-5 and still, nothing. I check var/nagios.log and find nothing strange, other than the fact that no event handler information is in there (not sure if it's supposed to or not). I checked nsclient.log and found all of the SSL errors. Those may have something to do with it, but I don't know for sure. There's not a lot of documentation out there regarding anything in Nagios or NSClient++. I tried going into nsc.ini and disabling SSL on NRPE (though it was already commented out) and trying again. That time it gave me the following error:
Code: Select all
message:modules\NRPEListener\NRPEListener.cpp:370: Could not read a full NRPE packet from socket, only got: 77
I also frequently see this error, with or without SSL on:
Code: Select all
error:CACHEmodules\NRPEListener\NRPEListener.cpp:70: No scripts found in path: scripts\*.*
..which is strange because my runcmd.bat file is in the NSClient++\scripts\ directory.
I finally noticed the "Failed to Sync" error on the services panel in the Core Configuration Manager after going back and checking all of my work. I have no idea what the Sync Status column is for in there, but my guess is that it MAY have something to do with all of this. I also have the feeling that once I fix one problem, the rest are going to start working as well. Sorry for the outrageous wall of text, but with something like this I figured I'd be as detailed as humanly possible so if there's any human error it can be pointed out.
servicerestart.sh:
Code: Select all
#!/bin/sh
# Event Handler for restarting Windows Services
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
/usr/local/nagios/libexec/check_nrpe -H "$2" -t 120 -c runcmd -a "$3"
;;
esac
exit 0