Check_Host_Alive timed out [FIXED]

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
ITSLR
Posts: 9
Joined: Thu Jul 03, 2014 6:30 am

Check_Host_Alive timed out [FIXED]

Post by ITSLR »

Hi all,

I experience some problems with the check_host_alive command on a few machines.

I'm running Nagios Core 3.5.1 on a CentOS 6.5

# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}

Unfortunately I have 3 machines which are always "down" and in "Host Check Timed Out" Status. But when I run the command manually as Nagios-User it works fine, also a normal Ping is working

[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware001 -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.40 ms|rta=0.400000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware002 -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.60 ms|rta=0.595000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vcenter -w 3000.0,80% -c 5000.0,100% -p 5 -4
PING OK - Packet loss = 0%, RTA = 0.53 ms|rta=0.527000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0

[nagios@kwapp001 ~]$ ping vmware001
PING vmware001 (192.168.34.176) 56(84) bytes of data.
64 bytes from vmware001 (192.168.34.176): icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=3 ttl=64 time=0.372 ms

I've already tried set the -t Parameter in my command.cfg but nothing changed.

Any ideas on this problem?

Thanks in advance
Last edited by ITSLR on Mon Jul 07, 2014 9:45 am, edited 1 time in total.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Check_Host_Alive timed out

Post by eloyd »

I'm still trying to think of what the answer may be but I'm curious where the -4 comes from at the end of these lines:

Code: Select all

[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware001 -w 3000.0,80% -c 5000.0,100% -p 5 -4
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vmware002 -w 3000.0,80% -c 5000.0,100% -p 5 -4
[nagios@kwapp001 ~]$ /usr/lib64/nagios/plugins/check_ping -H vcenter -w 3000.0,80% -c 5000.0,100% -p 5 -4
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ITSLR
Posts: 9
Joined: Thu Jul 03, 2014 6:30 am

Re: Check_Host_Alive timed out

Post by ITSLR »

-4 is for IPv4
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Check_Host_Alive timed out

Post by eloyd »

I'm not making myself clear. It doesn't show up in your check-host-alive command definition, so where is it coming from or are you adding it manually? I'm trying to make sure that you're comparing apples to apples.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ITSLR
Posts: 9
Joined: Thu Jul 03, 2014 6:30 am

Re: Check_Host_Alive timed out

Post by ITSLR »

Now I got it, I added it manually for testing purposes to make sure it uses IPv4 and not IPv6 - anyway manually it works with and without -4
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Check_Host_Alive timed out

Post by eloyd »

Okay. I know it wasn't the problem, but I wanted to make sure that my brain wasn't distracted as I sat down to figure out what might be going on.

Do you have other hosts that are being monitored correctly? You say you're monitoring "a few machines" and that you have 3 that exhibit this behavior. My thoughts are:
  • Are they on the same network as your Nagios box?
    Are they on the same network as each other?
    If you have boxes that work, are they on a different network or the same one?
    Can you post the output of grep "^[a-z].*check" nagios.cfg | sort
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ITSLR
Posts: 9
Joined: Thu Jul 03, 2014 6:30 am

Re: Check_Host_Alive timed out

Post by ITSLR »

>> You say you're monitoring "a few machines" and that you have 3 that exhibit this behavior
I monitor around 180 Hosts (including some virtual ones)

>> Are they on the same network as your Nagios box?
Yes they are - they are in a 192.168.32.0/20 Network

>> Are they on the same network as each other?
Yes they are

>> If you have boxes that work, are they on a different network or the same one?
All machines are in the above network

Can you post the output of grep "^[a-z].*check" nagios.cfg | sort
>>
accept_passive_host_checks=1
accept_passive_service_checks=1
auto_reschedule_checks=0
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_result_path=/var/log/nagios/spool/checkresults
command_check_interval=30s
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
execute_host_checks=1
execute_service_checks=1
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
log_passive_checks=1
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_host_check_spread=30
max_service_check_spread=30
passive_host_checks_are_soft=0
service_check_timeout=120
service_check_timeout_state=c
service_freshness_check_interval=60
service_inter_check_delay_method=s
translate_passive_host_checks=0
use_aggressive_host_checking=0
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Check_Host_Alive timed out

Post by eloyd »

Okay, thanks.

Was the "32" a typo below? Because your ping has a 34 where you said 32:
they are in a 192.168.32.0/20 Network

[nagios@kwapp001 ~]$ ping vmware001
PING vmware001 (192.168.34.176) 56(84) bytes of data.
64 bytes from vmware001 (192.168.34.176): icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from vmware001 (192.168.34.176): icmp_seq=3 ttl=64 time=0.372 ms
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ITSLR
Posts: 9
Joined: Thu Jul 03, 2014 6:30 am

Re: Check_Host_Alive timed out

Post by ITSLR »

No it's not a typo. Our Networkrange is 192.168.32.0 - 192.168.37.255 it's Networkmask is 255.255.240 instead of a normal 255.255.255.0 (/24).
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Check_Host_Alive timed out

Post by eloyd »

Whoops. I didn't catch the /20.

Rats. I was hoping it was going to be a network thing. :-)

Okay, I'll have to go back to thinking.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Locked