NRPE : Intermittent Error - Could not complete SSL handshake

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
fitzgerm
Posts: 6
Joined: Wed Feb 23, 2011 7:05 am

NRPE : Intermittent Error - Could not complete SSL handshake

Post by fitzgerm »

Sometimes the check on my monitied server comes back with error "Error - Could not complete SSL handshake" and can't track down here the problem is.
Have done the following :

- debug=1 and logging to /var/log/syslog, but doesn't give me any other error message besides the ssl one
- -t 160 on check_nrpe checks on the host server
- increased connection_timeout=500 in /etc/nagios/nrpe.cfg
- /etc/xinetd.d/nrpe confirmed has the host server IP address

Nagios server is monitoring 2 server, and only one showing this error - have compared configs on both and appear to be the same.

Anything else I can try - appreciate suggestions.

Thanks
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: NRPE : Intermittent Error - Could not complete SSL hands

Post by agriffin »

There are several FAQs here with possible causes and solutions. Give them a try and let us know if you still can't get it to work.
fitzgerm
Posts: 6
Joined: Wed Feb 23, 2011 7:05 am

Re: NRPE : Intermittent Error - Could not complete SSL hands

Post by fitzgerm »

Thanks for your suggestions and the links - I have gone through the nrpe.cfg configuration file and removed un-used check in case there is some typo here and so far error has not retunred on the server having the problem. Will monitor it, and update back but hopefully issue resolved.
fitzgerm
Posts: 6
Joined: Wed Feb 23, 2011 7:05 am

Re: NRPE : Intermittent Error - Could not complete SSL hands

Post by fitzgerm »

Doesn't look like issue resolved - still seeing this error intermittenly.

Think it may be some kind of time-out as from the syslog see that the command is taking a long time to complete just before the ssl error eg :

Oct 16 13:25:00 blade3 nrpe[17157]: Handling the connection...
Oct 16 13:25:00 blade3 nrpe[17157]: Error: Could not complete SSL handshake. 5
Oct 16 13:25:00 blade3 xinetd[1614]: EXIT: nrpe status=0 pid=17157 duration=143(sec)

Oct 16 14:10:24 blade3 nrpe[13519]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Oct 16 14:10:24 blade3 nrpe[13519]: Handling the connection...
Oct 16 14:10:24 blade3 nrpe[13519]: Error: Could not complete SSL handshake. 5
Oct 16 14:10:24 blade3 xinetd[1614]: EXIT: nrpe status=0 pid=13519 duration=125(sec)


However I have increased all the timeout values. On the host server I have set the -t 160 on the check_nrpe commands. On the monitored server in the nrpe.cfg I have set as follows :


# COMMAND TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# allow plugins to finish executing before killing them off.

command_timeout=200


# CONNECTION TIMEOUT
# This specifies the maximum number of seconds that the NRPE daemon will
# wait for a connection to be established before exiting. This is sometimes
# seen where a network problem stops the SSL being established even though
# all network sessions are connected. This causes the nrpe daemons to
# accumulate, eating system resources. Do not set this too low.

connection_timeout=500

Should the service_check_timeout in the nagios.cfg on the host server also be increased ?
Locked