Page 1 of 2

Check_Host_Alive timed out [FIXED]

Posted: Thu Jul 03, 2014 6:41 am
by ITSLR
Hi all,

I experience some problems with the check_host_alive command on a few machines.

I'm running Nagios Core 3.5.1 on a CentOS 6.5

# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}

Unfortunately I have 3 machines which are always "down" and in "Host Check Timed Out" Status. But when I run the command manually as Nagios-User it works fine, also a normal Ping is working

[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware001 -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.40 ms|rta=0.400000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware002 -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.60 ms|rta=0.595000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vcenter -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.53 ms|rta=0.527000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0

[nagios@kwapp001 ~]$ ping vmware001
PING vmware001 (192.168.34.176) 56(84) bytes of data.
64 bytes from vmware001 (192.168.34.176): icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=3 ttl=64 time=0.372 ms

I've already tried set the -t Parameter in my command.cfg but nothing changed.

Any ideas on this problem?

Thanks in advance

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 9:04 am
by eloyd
I'm still trying to think of what the answer may be but I'm curious where the -4 comes from at the end of these lines:

Code: Select all

[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware001 -w 3000.0,80% -c 5000.0,100% -p 5 -4
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware002 -w 3000.0,80% -c 5000.0,100% -p 5 -4
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vcenter -w 3000.0,80% -c 5000.0,100% -p 5 -4

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 9:34 am
by ITSLR
-4 is for IPv4

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 9:39 am
by eloyd
I'm not making myself clear. It doesn't show up in your check-host-alive command definition, so where is it coming from or are you adding it manually? I'm trying to make sure that you're comparing apples to apples.

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 9:53 am
by ITSLR
Now I got it, I added it manually for testing purposes to make sure it uses IPv4 and not IPv6 - anyway manually it works with and without -4

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 10:05 am
by eloyd
Okay. I know it wasn't the problem, but I wanted to make sure that my brain wasn't distracted as I sat down to figure out what might be going on.

Do you have other hosts that are being monitored correctly? You say you're monitoring "a few machines" and that you have 3 that exhibit this behavior. My thoughts are:
  • Are they on the same network as your Nagios box?
    Are they on the same network as each other?
    If you have boxes that work, are they on a different network or the same one?
    Can you post the output of grep "^[a-z].*check" nagios.cfg | sort

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 10:24 am
by ITSLR
>> You say you're monitoring "a few machines" and that you have 3 that exhibit this behavior
I monitor around 180 Hosts (including some virtual ones)

>> Are they on the same network as your Nagios box?
Yes they are - they are in a 192.168.32.0/20 Network

>> Are they on the same network as each other?
Yes they are

>> If you have boxes that work, are they on a different network or the same one?
All machines are in the above network

Can you post the output of grep "^[a-z].*check" nagios.cfg | sort
>>
accept_passive_host_checks=1
accept_passive_service_checks=1
auto_reschedule_checks=0
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_result_path=/var/log/nagios/spool/checkresults
command_check_interval=30s
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
execute_host_checks=1
execute_service_checks=1
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
log_passive_checks=1
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_host_check_spread=30
max_service_check_spread=30
passive_host_checks_are_soft=0
service_check_timeout=120
service_check_timeout_state=c
service_freshness_check_interval=60
service_inter_check_delay_method=s
translate_passive_host_checks=0
use_aggressive_host_checking=0

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 10:28 am
by eloyd
Okay, thanks.

Was the "32" a typo below? Because your ping has a 34 where you said 32:
they are in a 192.168.32.0/20 Network

[nagios@kwapp001 ~]$ ping vmware001
PING vmware001 (192.168.34.176) 56(84) bytes of data.
64 bytes from vmware001 (192.168.34.176): icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=3 ttl=64 time=0.372 ms

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 11:03 am
by ITSLR
No it's not a typo. Our Networkrange is 192.168.32.0 - 192.168.37.255 it's Networkmask is 255.255.240 instead of a normal 255.255.255.0 (/24).

Re: Check_Host_Alive timed out

Posted: Thu Jul 03, 2014 11:06 am
by eloyd
Whoops. I didn't catch the /20.

Rats. I was hoping it was going to be a network thing. :-)

Okay, I'll have to go back to thinking.