Nagios Support Forum

Posted: **Thu Jan 14, 2021 2:40 pm**

I've recently starting having a strange issue with the check-host-alive command that runs check_ping against all my hosts, I now have two hosts that will get a result of 100% packet loss and then when it retries it will come back with 0% packet loss. Without fail they follow this pattern.

I've testing running a continuous ping from the CLI along with the check_ping command and there's never more than 1 packet lost randomly from the CLI but always every other check_ping command has 5 packets lost reported.

I have also tested with check_ping as a service on the hosts in question and their results are the same as the ping run from the CLI.

I have Nagios Core v4.4.6 installed on Ubuntu 20.04.

I would like to add that I have other Nagios servers at other sites and I have added the particular host to those servers and they do not see the same issue, this tells me it is not a problem with the host.

Here is the command and host template being used

Code: Select all

define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 600.0,80% -c 900.0,100% -p 5
        }


define host{
	name			generic-switch-notify-weekdays	
	use			generic-host
	check_period		24x7
	check_interval		3
	retry_interval		1
	max_check_attempts	3
	check_command		check-host-alive
	notification_period	weekdays
	notification_interval	60
	notification_options	d,r
	contact_groups		admins
	register		0
	}

Here's a snip from the attached text file where I manually ran the Check_Ping script and it failed immediately the first try and then ran for all pings on the second try. I see that is is likely the problem, but I don't know why it's failing without pinging.

Code: Select all

64 bytes from 192.168.251.80: icmp_seq=1071 ttl=58 time=55.1 ms
64 bytes from 192.168.251.80: icmp_seq=1072 ttl=58 time=48.0 ms
64 bytes from 192.168.251.80: icmp_seq=1073 ttl=58 time=48.9 ms
64 bytes from 192.168.251.80: icmp_seq=1074 ttl=58 time=46.9 ms
64 bytes from 192.168.251.80: icmp_seq=1075 ttl=58 time=47.0 ms
64 bytes from 192.168.251.80: icmp_seq=1076 ttl=58 time=45.9 ms
^C
--- 192.168.251.80 ping statistics ---
1076 packets transmitted, 1074 received, +3 errors, 0.185874% packet loss, time 1076458ms
rtt min/avg/max/mdev = 45.705/50.604/281.101/19.049 ms
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING CRITICAL - Packet loss = 100%|rta=900.000000ms;600.000000;900.000000;0.000000 pl=100%;80;100;0             <----- This failed immediately
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING OK - Packet loss = 0%, RTA = 50.72 ms|rta=50.715000ms;600.000000;900.000000;0.000000 pl=0%;80;100;0        <----- This succeeded after running all 100 pings with no packet loss
admin@nagios:~$

Posted: **Wed Jan 27, 2021 11:03 am**

I have since replaced the Check_Ping plugin in all my commands with the Check_ICMP plugin and I haven't had further issues.

Re-installing Nagios fresh did not change anything, an only the two hosts ever had an issue, I still have no idea what happened and why Check_Ping doesn't work correctly.

Posted: **Wed Feb 03, 2021 3:37 pm**

Hi!

I have since replaced the Check_Ping plugin in all my commands with the Check_ICMP plugin and I haven't had further issues.

Thanks for the update and for sharing the solution!

We'll close this thread but just open another if you have any new issues.

Benjamin
Nagios Support Team

Nagios Support Forum

Every other check-host-alive ping fails

Every other check-host-alive ping fails

Re: Every other check-host-alive ping fails

Re: Every other check-host-alive ping fails