Page 1 of 1

Every other check-host-alive ping fails

Posted: Thu Jan 14, 2021 2:40 pm
by BuzzKillingtonne
I've recently starting having a strange issue with the check-host-alive command that runs check_ping against all my hosts, I now have two hosts that will get a result of 100% packet loss and then when it retries it will come back with 0% packet loss. Without fail they follow this pattern.

I've testing running a continuous ping from the CLI along with the check_ping command and there's never more than 1 packet lost randomly from the CLI but always every other check_ping command has 5 packets lost reported.

I have also tested with check_ping as a service on the hosts in question and their results are the same as the ping run from the CLI.

I have Nagios Core v4.4.6 installed on Ubuntu 20.04.

I would like to add that I have other Nagios servers at other sites and I have added the particular host to those servers and they do not see the same issue, this tells me it is not a problem with the host.

Here is the command and host template being used

Code: Select all

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 600.0,80% -c 900.0,100% -p 5
        }


define host{
	name			generic-switch-notify-weekdays	
	use			generic-host
	check_period		24x7
	check_interval		3
	retry_interval		1
	max_check_attempts	3
	check_command		check-host-alive
	notification_period	weekdays
	notification_interval	60
	notification_options	d,r
	contact_groups		admins
	register		0
	}
Here's a snip from the attached text file where I manually ran the Check_Ping script and it failed immediately the first try and then ran for all pings on the second try. I see that is is likely the problem, but I don't know why it's failing without pinging.

Code: Select all

64 bytes from 192.168.251.80: icmp_seq=1071 ttl=58 time=55.1 ms
64 bytes from 192.168.251.80: icmp_seq=1072 ttl=58 time=48.0 ms
64 bytes from 192.168.251.80: icmp_seq=1073 ttl=58 time=48.9 ms
64 bytes from 192.168.251.80: icmp_seq=1074 ttl=58 time=46.9 ms
64 bytes from 192.168.251.80: icmp_seq=1075 ttl=58 time=47.0 ms
64 bytes from 192.168.251.80: icmp_seq=1076 ttl=58 time=45.9 ms
^C
--- 192.168.251.80 ping statistics ---
1076 packets transmitted, 1074 received, +3 errors, 0.185874% packet loss, time 1076458ms
rtt min/avg/max/mdev = 45.705/50.604/281.101/19.049 ms
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING CRITICAL - Packet loss = 100%|rta=900.000000ms;600.000000;900.000000;0.000000 pl=100%;80;100;0             <----- This failed immediately
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING OK - Packet loss = 0%, RTA = 50.72 ms|rta=50.715000ms;600.000000;900.000000;0.000000 pl=0%;80;100;0        <----- This succeeded after running all 100 pings with no packet loss
admin@nagios:~$

Re: Every other check-host-alive ping fails

Posted: Wed Jan 27, 2021 11:03 am
by BuzzKillingtonne
I have since replaced the Check_Ping plugin in all my commands with the Check_ICMP plugin and I haven't had further issues.

Re-installing Nagios fresh did not change anything, an only the two hosts ever had an issue, I still have no idea what happened and why Check_Ping doesn't work correctly.

Re: Every other check-host-alive ping fails

Posted: Wed Feb 03, 2021 3:37 pm
by benjaminsmith
Hi!
I have since replaced the Check_Ping plugin in all my commands with the Check_ICMP plugin and I haven't had further issues.
Thanks for the update and for sharing the solution!

We'll close this thread but just open another if you have any new issues.

Benjamin
Nagios Support Team