Page 1 of 1

Host check times out but plugin from command line returns OK

Posted: Mon Jun 28, 2021 2:52 pm
by amb_gopai
My issue appears identical to that described here:

https://support.nagios.com/forum/viewto ... =7&t=56089

But that topic is locked and does not appear to have resolution.

Software version: Nagios core 4.4.6 (April 28, 2020)
Running on CentOS Linux release 7.9.2009
Nagios plugins EPEL7 v. 2.3.3-2.el7

The relevant entry from the Nagios log file:
[1624909376] wproc: Core Worker 18130: job 51 (pid=27688) timed out. Killing it
[1624909376] wproc: CHECK job 51 from worker Core Worker 18130 timed out after 30.01s
[1624909376] wproc: host=ns2.gopai.com; service=(null);
[1624909376] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1624909376] Warning: Check of host 'ns2.gopai.com' timed out after 30.01 seconds
[1624909376] wproc: Core Worker 18130: job 51 (pid=27688): Dormant child reaped
Relevant command config:
define command {
command_name check-host-alive-dyn
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
Same command run from command line while active checks are failing:
[root@nagios log]# for ((i=0; i<10; i++)); do echo "trial $i"; time /usr/lib64/nagios/plugins/check_ping ns2.gopai.com -w 8000.0,80% -c 15000.0,100% -p 10; done
trial 0
PING OK - Packet loss = 0%, RTA = 66.47 ms|rta=66.473999ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.141s
user 0m0.004s
sys 0m0.009s
trial 1
PING OK - Packet loss = 0%, RTA = 66.56 ms|rta=66.559998ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.089s
user 0m0.004s
sys 0m0.008s
trial 2
PING OK - Packet loss = 0%, RTA = 66.44 ms|rta=66.442001ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.092s
user 0m0.001s
sys 0m0.012s
trial 3
PING OK - Packet loss = 0%, RTA = 66.84 ms|rta=66.837997ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.093s
user 0m0.002s
sys 0m0.010s
trial 4
PING OK - Packet loss = 0%, RTA = 66.59 ms|rta=66.592003ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.092s
user 0m0.006s
sys 0m0.007s
trial 5
PING OK - Packet loss = 0%, RTA = 66.74 ms|rta=66.738998ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.092s
user 0m0.005s
sys 0m0.008s
trial 6
PING OK - Packet loss = 0%, RTA = 68.44 ms|rta=68.438004ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.091s
user 0m0.005s
sys 0m0.007s
trial 7
PING OK - Packet loss = 0%, RTA = 66.61 ms|rta=66.605003ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.083s
user 0m0.006s
sys 0m0.007s
trial 8
PING OK - Packet loss = 0%, RTA = 66.43 ms|rta=66.428001ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.094s
user 0m0.003s
sys 0m0.011s
trial 9
PING OK - Packet loss = 0%, RTA = 67.17 ms|rta=67.170998ms;8000.000000;15000.000000;0.000000 pl=0%;80;100;0

real 0m9.093s
user 0m0.006s
sys 0m0.007s
[root@nagios log]#
I'm not sure what the disconnect here is but I've never managed to catch this check from the command line taking more than 10 seconds.

I'm starting a new thread because the relevant one I found was locked and was hoping someone had some suggestions to troubleshoot this further?

Re: Host check times out but plugin from command line return

Posted: Tue Jun 29, 2021 12:36 pm
by pbroste
Hello;
Thanks for following up and reaching out further on this issue.

It appears that you have done quite a bit of research and troubleshooting up to this point with no clear path to resolution.

I would suggest that may need to take a step back and look in a different direction. Several suggestions as we advance:
  • Look at network statistics using a "check network plugin" and view the real-time stats on the network adapter to get details.
  • Second option; to capture pcap on the 'device/server' using tcpdump and putting in place filters to capture only what is necessary.
  • Option to capture the plugin results by adding the capture_output.pl plugin.
  • [code]capture_output.pl /usr/local/nagios/libexec/check_ping [youripaddress] -w 8000.0,80% -c 15000.0,100% -p 10[/code]

Code: Select all

Example; 

define command {
command_name check-host-alive-dyn
command_line $USER1$/capture_output.pl /usr/local/nagios/libexec/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
The output on capture_output.pl is located in the /tmp/ directory.

Thanks,
Perry