We're trying to configure NRPE monitoring on a server that already has a pre-existing configuration of NRPE as it's being monitored by another team. We're having issues with the connectivity from our Nagios server to this server after doing all the necessary steps (adding our nagios server to the allowed hosts in the nrpe.cfg file etc.) Error we're seeing are SSL related as per below:
Here's the command being ran from our Nagios server:
[nagios@a1c-nxi01 etc]$ /usr/local/nagios/libexec/check_nrpe -H 172.20.0.3
CHECK_NRPE: Error - Could not connect to 172.20.0.3: Connection reset by peer
[nagios@a1c-nxi01 etc]$ /usr/local/nagios/libexec/check_nrpe -H 172.20.0.3 --no-ssl
CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
Here's the logs from the server:
Sep 21 14:54:26 wmdweb02-new nrpe[28300]: Error: Network server getpeername() failure (107: Transport endpoint is not connected)
Sep 21 14:54:26 wmdweb02-new nrpe[28300]: Error: (!log_opts) Could not complete SSL handshake with : timeout 300 seconds
Sep 21 14:54:38 wmdweb02-new nrpe[28314]: Error: (!log_opts) Could not complete SSL handshake with 10.200.203.4: 1
Here's a snippet of the nrpe.cfg on the server (The last IP is our Nagios server):
we're not using xinetd on this server:
[adventone@wmdweb02-new xinetd.d]$ sudo ls -la /etc/xinetd.d/nrpe
ls: cannot access /etc/xinetd.d/nrpe: No such file or directory
Thanks for following up. Taking a step back and reviewing the original post, appears that the connection is not made. Please verify that there are no security applications like Selinux (sestatus) or firewall rules blocking port 5666. We see that you are running as 'nagios' user account, please also try running as root to see if you get different results (su -l root).
What version of the NRPE:
This is the NRPE version on the server we're trying to monitor:
[adventone@wmdweb02-new run]$ /usr/sbin/nrpe -V
NRPE - Nagios Remote Plugin Executor
Version: 4.0.3
[root@a1c-nxi01 cantonio]# /usr/local/nagios/libexec/check_nrpe -H 172.20.0.3 -c check_users -a '-w 5 -c 10' --no-ssl
CHECK_NRPE: Error - Could not connect to 172.20.0.3: Connection reset by peer
logs showing in /var/log/messages:
Sep 23 15:50:10 wmdweb02-new nrpe[19566]: Error: Network server getpeername() failure (107: Transport endpoint is not connected)
Sep 23 15:50:10 wmdweb02-new nrpe[19566]: Error: (!log_opts) Could not complete SSL handshake with : timeout 300 seconds
And /usr/local/nagios/libexec/check_nrpe -H 172.20.0.3 -c check_users -a '-w 5 -c 10'
[root@a1c-nxi01 cantonio]# /usr/local/nagios/libexec/check_nrpe -H 172.20.0.3 -c check_users -a '-w 5 -c 10'
CHECK_NRPE: Error - Could not connect to 172.20.0.3: Connection reset by peer
logs showing in /var/log/messages:
Sep 23 15:52:06 wmdweb02-new nrpe[20043]: Error: Network server getpeername() failure (107: Transport endpoint is not connected)
Sep 23 15:52:06 wmdweb02-new nrpe[20043]: Error: (!log_opts) Could not complete SSL handshake with : timeout 300 seconds
If you have tcpdump installed please view traffic and the following to see if we can get there on port 5666
Thanks for following up, appears that we see that all other protocols are able to establish a connection on port 5666 to 172.20.0.3. Took a look at the nrpe.cfg and see that the 'allowed_hosts:" option will work in your case since NRPE is not running under inetd (please verify) or xinetd.
# ALLOWED HOST ADDRESSES
# This is an optional comma-delimited list of IP address or hostnames
# that are allowed to talk to the NRPE daemon. Network addresses with a bit mask
# (i.e. 192.168.1.0/24) are also supported. Hostname wildcards are not currently
# supported.
#
# Note: The daemon only does rudimentary checking of the client's IP
# address. I would highly recommend adding entries in your /etc/hosts.allow
# file to allow only the specified host to connect to the port
# you are running this daemon on.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd
Edit the following line in /usr/local/nagios/etc/nrpe.cfg and add the server IP Address that you want to allow connection to.
Yes, we've ensured that our Nagios server is listed on the allowed_hosts variable in the nrpe.cfg file. We've restarted the nrpe service many times. Still the same issue.
It is not possible for us to re-install the existing NRPE on these servers as it is owned/used by another team. Is it instead possible for us to run our own instance of NRPE on the same server? Would we just need a different dedicated user/group/port for the NRPE?