A couple seemingly basic issues I can't seem to overcome...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
jbwaclawski
Posts: 4
Joined: Fri Dec 30, 2011 11:47 am

A couple seemingly basic issues I can't seem to overcome...

Post by jbwaclawski »

I'm working with the Nagios XI trial VM image and have encountered a few obnoxious issues..

1) I am continuously receiving the following error in my NSClient++ log file (version 0.3.9.330 2011-09-02 x64):

Code: Select all

error:include\Socket.h:713: Error: Could not complete SSL handshake : [-1] 1, attempting to resume...
I don't know what to try with this aside from enabling/disabling SSL. Any thoughts would be very helpful.

2) Event Handlers just do not work, or I'm missing a very small, very crucial step. I've followed the extremely basic configuration found here: http://assets.nagios.com/downloads/nagi ... h_NRPE.pdf . My .bat file works on Windows and my shell script works from linux when I feed in ./servicerestart.sh "CRITICAL" 192.168.1.103 Spooler, but the service monitor just won't trigger the damn thing during any states at all. I don't know if it's something to do with my SSL or not.

Setup:
>> Nagios XI VM image, running CentOS 6.3, hosted in VirtualBox
>> Time on both NagiosXI and client sync'd to same public NTP pool
>> Client is Windows 7 x64, missing a few updates
>> Using NRPE within NSClient++ v. 0.3.9.330 2011-09-02 x64 (./check_nrpe -H 192.168.1.103 works fine)

There's a start, thanks for the help and let me know if you have any questions.
jbwaclawski
Posts: 4
Joined: Fri Dec 30, 2011 11:47 am

Re: A couple seemingly basic issues I can't seem to overcome

Post by jbwaclawski »

Add onto that this:

3) The monitor I created to poll the Print Spooler won't Sync (Core Configuration Manager >> Services >> Service Status). I don't know why. I've tried deleting and recreating the monitor, restarting Nagios, and restarting the server completely. Nothing is working. I'm not sure if this links to either of the other problems, but it's driving me nuts.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: A couple seemingly basic issues I can't seem to overcome

Post by slansing »

1) Check your firewall, make sure that TCP/UDP 5666 are open on both your Nagios server and the destination IP.

2) If you are unable to get anything checks to go through to your Windows server this will not work. But you did mention that the following works:

Code: Select all

./check_nrpe -H xxx.xxx.xx.x -p 5666 -c runcmd -a spooler
Correct?
jbwaclawski
Posts: 4
Joined: Fri Dec 30, 2011 11:47 am

Re: A couple seemingly basic issues I can't seem to overcome

Post by jbwaclawski »

Yeah, here's my test setup:

I have the Nagios XI trial VM installed in Virtualbox running on my desktop. I have all firewalls and AV applications on my host (desktop) turned off and/or disabled so as to limit possible blockages.

I can do the basic check to my desktop from my VM with no issue...

Code: Select all

./check_nrpe -H host_ip 
...I can even activate the remote script without issue as well as pass arguments to it...

Code: Select all

./check_nrpe -H host_ip -c runcmd

Code: Select all

./check_nrpe -H host_ip -c runcmd -a Spooler
I then created a script within the VM to use as the trigger of my event handler (code below). This script even works when I pass arguments to it...

Code: Select all

./servicerestart.sh "CRITICAL" 192.168.1.103 Spooler
After all of the manual testing was done I figured I was ready to test the application's ability to trigger an event handler, so I created a service monitor. Once I had it monitoring Spooler and received data showing that it was up/down properly, I moved back to CCM to add the event handler information. I made a command that read as such:

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICE$
..set it as a misc command, made it Active, and saved the operation. Once back on the commands screen, I applied the configuration to be safe. I then moved to Services and began editing the previous monitor I created for the Print Spooler, selecting my newly created Service Restart event handler as it's event handler, turning them on, and moving to create a variable definition of _SERVICE = Spooler. Save all of that, apply the config and move to testing.

At this point I manually stop the Spooler service on my desktop, and let Nagios find out on it's own. It waits, it checks, it detects that it's down, but no event handler launches. I let it go through tries 1-5 and still, nothing. I check var/nagios.log and find nothing strange, other than the fact that no event handler information is in there (not sure if it's supposed to or not). I checked nsclient.log and found all of the SSL errors. Those may have something to do with it, but I don't know for sure. There's not a lot of documentation out there regarding anything in Nagios or NSClient++. I tried going into nsc.ini and disabling SSL on NRPE (though it was already commented out) and trying again. That time it gave me the following error:

Code: Select all

message:modules\NRPEListener\NRPEListener.cpp:370: Could not read a full NRPE packet from socket, only got: 77
I also frequently see this error, with or without SSL on:

Code: Select all

error:CACHEmodules\NRPEListener\NRPEListener.cpp:70: No scripts found in path: scripts\*.*
..which is strange because my runcmd.bat file is in the NSClient++\scripts\ directory.

I finally noticed the "Failed to Sync" error on the services panel in the Core Configuration Manager after going back and checking all of my work. I have no idea what the Sync Status column is for in there, but my guess is that it MAY have something to do with all of this. I also have the feeling that once I fix one problem, the rest are going to start working as well. Sorry for the outrageous wall of text, but with something like this I figured I'd be as detailed as humanly possible so if there's any human error it can be pointed out.

servicerestart.sh:

Code: Select all

#!/bin/sh
# Event Handler for restarting Windows Services

case "$1" in
        OK)
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)
                /usr/local/nagios/libexec/check_nrpe -H "$2" -t 120 -c runcmd -a "$3"
        ;;
esac
exit 0
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: A couple seemingly basic issues I can't seem to overcome

Post by scottwilkerson »

jbwaclawski wrote:After all of the manual testing was done I figured I was ready to test the application's ability to trigger an event handler, so I created a service monitor. Once I had it monitoring Spooler and received data showing that it was up/down properly, I moved back to CCM to add the event handler information. I made a command that read as such:

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICE$

.
This was close, but it should be

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ $_SERVICESERVICE$
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked