Page 1 of 3

NRPE Errors

Posted: Wed Jan 31, 2018 9:02 am
by crrussell3
We seem to be having some issues with Nagios NRPE checks coming back with bad results:

Examples:

1. CHECK_NRPE: Error - Could not complete SSL handshake.
At times, checks will come back with this result. You force a recheck, and it comes back without issue. What could be causing this? Everything thing I read online points to it being a config issue with the nsclient.ini file, such as not disabling ssl or not having the correct allowed Nagios host.

2. (No output on stdout) stderr: connect to address 10.192.1.190 port 5666: No route to host
We will receive this error randomly also. Looking into it, it appears to be a routing/networking error. Problem is, the Nagios server and monitored server sit on the same subnet/vlan, even on the same Hyper-V host and virtual switch. There shouldn't be a networking problem. Nothing but Nagios reports this issue. We don't see drops in communication from apps to db servers during this time, or no other signs of communications issues.

If anyone has any guidance on these issues, it would be appreciated.

Thanks!

Re: NRPE Errors

Posted: Wed Jan 31, 2018 1:15 pm
by npolovenko
Hello, @crrussell3. I'd try increasing the timeout on these two checks that are giving you errors by adding -t 60 to their commands.
Can you also upload the nsclient.ini and nsclient.log files here? They should be located in the same folder on a windows server.

Re: NRPE Errors

Posted: Wed Jan 31, 2018 4:40 pm
by crrussell3
Here are the two files requested.

Here is a copy of the alert we received:

Code: Select all

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: CORPHVDB-Cluster1 Compellent1 ECOMSQL LOGS Free Space
Host: corphvdb1-h.hy-vee.net
Address: 10.215.20.91
State: CRITICAL
Info:
CHECK_NRPE: Error - Could not complete SSL handshake.
Date/Time: 2018-01-30 15:44:43

Respond: https://nagiosprod1-v/nagiosxi/rr.php?uid=23-754-b40d40b4ddd2b6949d3c15d4809fc480
Nagios URL: https://nagiosprod1-v/nagiosxi/

Notes: Escalate ticket to Systems team 24x7. If after hours, contact the Systems on-call phone number.
We have already set the default timeout on checks to 60 seconds.

Re: NRPE Errors

Posted: Wed Jan 31, 2018 4:59 pm
by npolovenko
@crrussell3, Do you know what version of NSClient you have installed? I recommend upgrading to the latest version they have on their website: 0.5.2.35.
Also, I've seen some memory-related errors in the log file. I wonder if that is a memory allocation issue on the NSClient's side or your windows server is literally running out of ram, I'd check into that.

Re: NRPE Errors

Posted: Wed Jan 31, 2018 5:11 pm
by crrussell3
On this particular server, I am running NSClient++ 0.5.1.044.

This is a Hyper-V Cluster host, but I haven't seen alerts for it being low on available memory itself.

Re: NRPE Errors

Posted: Wed Jan 31, 2018 5:49 pm
by npolovenko
@crrussell3, Can you add this paragraph to your nsclient.ini file to see if that fixes the issue. You'd need to restart the NSClient service after you make the changes. Also, have you upgraded the NSClient recently? And if yes, was everything working normally before?

Code: Select all

[/settings/external scripts]
allow arguments = 1
allow nasty characters = 1
timeout = 90

Re: NRPE Errors

Posted: Thu Feb 01, 2018 11:24 am
by crrussell3
Currently I have:

Code: Select all

[/settings/external scripts]
allow arguments = true
allow nasty characters = true
Is there a difference between using "true" or "1"?

I have not upgraded versions. The SSL error doesn't occur too often, but the no route to host happens a little more than the SSL.

Re: NRPE Errors

Posted: Thu Feb 01, 2018 11:31 am
by npolovenko
@crrussell3, true instead of 1 should be ok. Did you add the timeout value? After that please restart the NSClient++ service to see if the problem was fixed.

Re: NRPE Errors

Posted: Thu Feb 01, 2018 11:39 am
by crrussell3
I added the timeout and restarted the service.

I will monitor to see if I get as many SSL errors.

Looking through my state history, I can see this particular server over multiple checks, has had the SSL error 17 times.

Re: NRPE Errors

Posted: Thu Feb 01, 2018 2:18 pm
by npolovenko
@crrussell3, Sounds good. Let us know if you still experience problems with those two hosts.

In [/settings/NRPE/server] please also add this:

Code: Select all

use SSL = 1 
Restart the NSCLient service.

What is the version of check_nrpe plugin on the NagiosXI side? To check you can run this command:

Code: Select all

/usr/local/nagios/libexec/check_nrpe
It'll say a version number at the top of the output.