NRPE Errors

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

NRPE Errors

Post by crrussell3 »

We seem to be having some issues with Nagios NRPE checks coming back with bad results:

Examples:

1. CHECK_NRPE: Error - Could not complete SSL handshake.
At times, checks will come back with this result. You force a recheck, and it comes back without issue. What could be causing this? Everything thing I read online points to it being a config issue with the nsclient.ini file, such as not disabling ssl or not having the correct allowed Nagios host.

2. (No output on stdout) stderr: connect to address 10.192.1.190 port 5666: No route to host
We will receive this error randomly also. Looking into it, it appears to be a routing/networking error. Problem is, the Nagios server and monitored server sit on the same subnet/vlan, even on the same Hyper-V host and virtual switch. There shouldn't be a networking problem. Nothing but Nagios reports this issue. We don't see drops in communication from apps to db servers during this time, or no other signs of communications issues.

If anyone has any guidance on these issues, it would be appreciated.

Thanks!
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: NRPE Errors

Post by npolovenko »

Hello, @crrussell3. I'd try increasing the timeout on these two checks that are giving you errors by adding -t 60 to their commands.
Can you also upload the nsclient.ini and nsclient.log files here? They should be located in the same folder on a windows server.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: NRPE Errors

Post by crrussell3 »

Here are the two files requested.

Here is a copy of the alert we received:

Code: Select all

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: CORPHVDB-Cluster1 Compellent1 ECOMSQL LOGS Free Space
Host: corphvdb1-h.hy-vee.net
Address: 10.215.20.91
State: CRITICAL
Info:
CHECK_NRPE: Error - Could not complete SSL handshake.
Date/Time: 2018-01-30 15:44:43

Respond: https://nagiosprod1-v/nagiosxi/rr.php?uid=23-754-b40d40b4ddd2b6949d3c15d4809fc480
Nagios URL: https://nagiosprod1-v/nagiosxi/

Notes: Escalate ticket to Systems team 24x7. If after hours, contact the Systems on-call phone number.
We have already set the default timeout on checks to 60 seconds.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: NRPE Errors

Post by npolovenko »

@crrussell3, Do you know what version of NSClient you have installed? I recommend upgrading to the latest version they have on their website: 0.5.2.35.
Also, I've seen some memory-related errors in the log file. I wonder if that is a memory allocation issue on the NSClient's side or your windows server is literally running out of ram, I'd check into that.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: NRPE Errors

Post by crrussell3 »

On this particular server, I am running NSClient++ 0.5.1.044.

This is a Hyper-V Cluster host, but I haven't seen alerts for it being low on available memory itself.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: NRPE Errors

Post by npolovenko »

@crrussell3, Can you add this paragraph to your nsclient.ini file to see if that fixes the issue. You'd need to restart the NSClient service after you make the changes. Also, have you upgraded the NSClient recently? And if yes, was everything working normally before?

Code: Select all

[/settings/external scripts]
allow arguments = 1
allow nasty characters = 1
timeout = 90
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: NRPE Errors

Post by crrussell3 »

Currently I have:

Code: Select all

[/settings/external scripts]
allow arguments = true
allow nasty characters = true
Is there a difference between using "true" or "1"?

I have not upgraded versions. The SSL error doesn't occur too often, but the no route to host happens a little more than the SSL.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: NRPE Errors

Post by npolovenko »

@crrussell3, true instead of 1 should be ok. Did you add the timeout value? After that please restart the NSClient++ service to see if the problem was fixed.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: NRPE Errors

Post by crrussell3 »

I added the timeout and restarted the service.

I will monitor to see if I get as many SSL errors.

Looking through my state history, I can see this particular server over multiple checks, has had the SSL error 17 times.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: NRPE Errors

Post by npolovenko »

@crrussell3, Sounds good. Let us know if you still experience problems with those two hosts.

In [/settings/NRPE/server] please also add this:

Code: Select all

use SSL = 1 
Restart the NSCLient service.

What is the version of check_nrpe plugin on the NagiosXI side? To check you can run this command:

Code: Select all

/usr/local/nagios/libexec/check_nrpe
It'll say a version number at the top of the output.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked