Page 1 of 1

CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.

Posted: Wed Sep 26, 2018 6:31 pm
by azharctos
HI

server monitored by nagios almost 2-3 month, and today suddenly we received errror and keep flapping as above error until now we're unable to solve it.

1: Port still open

[root@dc-nagios ~]# telnet xxxxxx 5666
Trying xxxxxx..
Connected to xxxx.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

2 : we stop and start xinetd - (we found many defunct procees), but issue still remain
[root@xxxxx~]# ps aux | grep nrpe
nagios 58213 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58216 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58224 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58227 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58229 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58231 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58236 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58239 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58240 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58243 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58244 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58245 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58246 0.0 0.0 6232 780 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
root 58266 0.0 0.0 103320 908 pts/0 S+ 07:08 0:00 grep nrpe
[root@xxxx~]#

3: nrpe configuraiton :
[root@dxxxxxx ~]# cat /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 xxxxxx xxxxxxxx
}

3: No issue from network and security site.
4: so far no error found at the server site (dmesg,messages)
5: server load, memory n network still ok
6: Rebooted server, but after few hours (2-3 hours) issue occured again

+++
at first we able to run, after that socket time out even from local server.
[root@xxxxxx04 libexec]# ./check_nrpe -H localhost
NRPE v2.13
[root@xxxxxx libexec]# ./check_nrpe -H localhost
CHECK_NRPE: Socket timeout after 10 seconds.

Client : Red Hat Enterprise Linux Server release 6.8 (Santiago) (64 bit)
Nagios server : Red Hat Enterprise Linux Server release 7.4 (Maipo) (64 bit)
Nagios server running in physical server

Re: CHECK_NRPE STATE CRITICAL: Socket timeout after 30 secon

Posted: Thu Sep 27, 2018 10:58 am
by tgriep
I answered the ticket you opened but I'll put the same info here.
Let's enable debugging for the NRPE agent that is running on the remote server.
Edit the nrpe.cfg file and change this line from

Code: Select all

debug=0
to

Code: Select all

debug=1
Save the change and restart xinetd to reload the change.

Code: Select all

service xinetd restart
The NRPE agent logs it's messages using syslog so the debug messages should show up in the /var/log/messages file on the remote server.

If you find any errors, post them to the ticket so I can view them.