Every other check-host-alive ping fails
Posted: Thu Jan 14, 2021 2:40 pm
I've recently starting having a strange issue with the check-host-alive command that runs check_ping against all my hosts, I now have two hosts that will get a result of 100% packet loss and then when it retries it will come back with 0% packet loss. Without fail they follow this pattern.
I've testing running a continuous ping from the CLI along with the check_ping command and there's never more than 1 packet lost randomly from the CLI but always every other check_ping command has 5 packets lost reported.
I have also tested with check_ping as a service on the hosts in question and their results are the same as the ping run from the CLI.
I have Nagios Core v4.4.6 installed on Ubuntu 20.04.
I would like to add that I have other Nagios servers at other sites and I have added the particular host to those servers and they do not see the same issue, this tells me it is not a problem with the host.
Here is the command and host template being used
Here's a snip from the attached text file where I manually ran the Check_Ping script and it failed immediately the first try and then ran for all pings on the second try. I see that is is likely the problem, but I don't know why it's failing without pinging.
I've testing running a continuous ping from the CLI along with the check_ping command and there's never more than 1 packet lost randomly from the CLI but always every other check_ping command has 5 packets lost reported.
I have also tested with check_ping as a service on the hosts in question and their results are the same as the ping run from the CLI.
I have Nagios Core v4.4.6 installed on Ubuntu 20.04.
I would like to add that I have other Nagios servers at other sites and I have added the particular host to those servers and they do not see the same issue, this tells me it is not a problem with the host.
Here is the command and host template being used
Code: Select all
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 600.0,80% -c 900.0,100% -p 5
}
define host{
name generic-switch-notify-weekdays
use generic-host
check_period 24x7
check_interval 3
retry_interval 1
max_check_attempts 3
check_command check-host-alive
notification_period weekdays
notification_interval 60
notification_options d,r
contact_groups admins
register 0
}
Code: Select all
64 bytes from 192.168.251.80: icmp_seq=1071 ttl=58 time=55.1 ms
64 bytes from 192.168.251.80: icmp_seq=1072 ttl=58 time=48.0 ms
64 bytes from 192.168.251.80: icmp_seq=1073 ttl=58 time=48.9 ms
64 bytes from 192.168.251.80: icmp_seq=1074 ttl=58 time=46.9 ms
64 bytes from 192.168.251.80: icmp_seq=1075 ttl=58 time=47.0 ms
64 bytes from 192.168.251.80: icmp_seq=1076 ttl=58 time=45.9 ms
^C
--- 192.168.251.80 ping statistics ---
1076 packets transmitted, 1074 received, +3 errors, 0.185874% packet loss, time 1076458ms
rtt min/avg/max/mdev = 45.705/50.604/281.101/19.049 ms
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING CRITICAL - Packet loss = 100%|rta=900.000000ms;600.000000;900.000000;0.000000 pl=100%;80;100;0 <----- This failed immediately
admin@nagios:~$ /usr/local/nagios/libexec/check_ping -H 192.168.251.80 -w 600.0,80% -c 900.0,100% -p 100
PING OK - Packet loss = 0%, RTA = 50.72 ms|rta=50.715000ms;600.000000;900.000000;0.000000 pl=0%;80;100;0 <----- This succeeded after running all 100 pings with no packet loss
admin@nagios:~$