CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Posted: Wed Sep 26, 2018 6:31 pm
HI
server monitored by nagios almost 2-3 month, and today suddenly we received errror and keep flapping as above error until now we're unable to solve it.
1: Port still open
[root@dc-nagios ~]# telnet xxxxxx 5666
Trying xxxxxx..
Connected to xxxx.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
2 : we stop and start xinetd - (we found many defunct procees), but issue still remain
[root@xxxxx~]# ps aux | grep nrpe
nagios 58213 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58216 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58224 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58227 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58229 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58231 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58236 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58239 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58240 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58243 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58244 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58245 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58246 0.0 0.0 6232 780 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
root 58266 0.0 0.0 103320 908 pts/0 S+ 07:08 0:00 grep nrpe
[root@xxxx~]#
3: nrpe configuraiton :
[root@dxxxxxx ~]# cat /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 xxxxxx xxxxxxxx
}
3: No issue from network and security site.
4: so far no error found at the server site (dmesg,messages)
5: server load, memory n network still ok
6: Rebooted server, but after few hours (2-3 hours) issue occured again
+++
at first we able to run, after that socket time out even from local server.
[root@xxxxxx04 libexec]# ./check_nrpe -H localhost
NRPE v2.13
[root@xxxxxx libexec]# ./check_nrpe -H localhost
CHECK_NRPE: Socket timeout after 10 seconds.
Client : Red Hat Enterprise Linux Server release 6.8 (Santiago) (64 bit)
Nagios server : Red Hat Enterprise Linux Server release 7.4 (Maipo) (64 bit)
Nagios server running in physical server
server monitored by nagios almost 2-3 month, and today suddenly we received errror and keep flapping as above error until now we're unable to solve it.
1: Port still open
[root@dc-nagios ~]# telnet xxxxxx 5666
Trying xxxxxx..
Connected to xxxx.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
2 : we stop and start xinetd - (we found many defunct procees), but issue still remain
[root@xxxxx~]# ps aux | grep nrpe
nagios 58213 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58216 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58224 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58227 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58229 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58231 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58236 0.0 0.0 0 0 ? Zs 07:08 0:00 [nrpe] <defunct>
nagios 58239 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58240 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58243 0.0 0.0 6232 784 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58244 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58245 0.0 0.0 6232 788 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
nagios 58246 0.0 0.0 6232 780 ? Ss 07:08 0:00 nrpe -c /usr/local/nagios/etc/nrpe.cfg --inetd
root 58266 0.0 0.0 103320 908 pts/0 S+ 07:08 0:00 grep nrpe
[root@xxxx~]#
3: nrpe configuraiton :
[root@dxxxxxx ~]# cat /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 xxxxxx xxxxxxxx
}
3: No issue from network and security site.
4: so far no error found at the server site (dmesg,messages)
5: server load, memory n network still ok
6: Rebooted server, but after few hours (2-3 hours) issue occured again
+++
at first we able to run, after that socket time out even from local server.
[root@xxxxxx04 libexec]# ./check_nrpe -H localhost
NRPE v2.13
[root@xxxxxx libexec]# ./check_nrpe -H localhost
CHECK_NRPE: Socket timeout after 10 seconds.
Client : Red Hat Enterprise Linux Server release 6.8 (Santiago) (64 bit)
Nagios server : Red Hat Enterprise Linux Server release 7.4 (Maipo) (64 bit)
Nagios server running in physical server