False packet lost alert from Nagios

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
imran_khan
Posts: 196
Joined: Mon May 27, 2013 3:16 pm

False packet lost alert from Nagios

Post by imran_khan »

Hello,

I am getting false packet lost alert for the servers from Nagios.
Nagios showing packet lost. Please find the details for the same and suggest me on the same.

check_ping command definition:-

Code: Select all

define command{
        command_name    check_ping
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
        }
Service:-

Code: Select all

define service{
                use                             generic-service         ; Name of service template to 	
                host_name                       example.com
                service_description             PING
                is_volatile                     0
                check_period                    24x7
                max_check_attempts              3
                normal_check_interval           30
                retry_check_interval            5
                contact_groups                  xyzgrp
                notification_interval           120
                notification_period             24x7
                notification_options            w,u,c,r
                check_command                   check_ping!30000,1%!100000,2%
                }

Edit:

Hello,

When I am running ping command manually not getting any packet lost.

Ping server 1.1.1.1 from Nagios:-

Code: Select all

# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=128 time=0.624 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=128 time=0.325 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=128 time=0.355 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=128 time=0.887 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=128 time=0.544 ms
64 bytes from 1.1.1.1: icmp_seq=6 ttl=128 time=0.362 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=128 time=0.578 ms
64 bytes from 1.1.1.1: icmp_seq=8 ttl=128 time=0.525 ms
64 bytes from 1.1.1.1: icmp_seq=9 ttl=128 time=0.960 ms
64 bytes from 1.1.1.1: icmp_seq=10 ttl=128 time=2.25 ms
64 bytes from 1.1.1.1: icmp_seq=11 ttl=128 time=14.0 ms
64 bytes from 1.1.1.1: icmp_seq=12 ttl=128 time=0.459 ms
64 bytes from 1.1.1.1: icmp_seq=13 ttl=128 time=0.619 ms
64 bytes from 1.1.1.1: icmp_seq=14 ttl=128 time=0.563 ms
64 bytes from 1.1.1.1: icmp_seq=15 ttl=128 time=1.70 ms
64 bytes from 1.1.1.1: icmp_seq=16 ttl=128 time=2.84 ms
64 bytes from 1.1.1.1: icmp_seq=17 ttl=128 time=0.402 ms
64 bytes from 1.1.1.1: icmp_seq=18 ttl=128 time=0.465 ms
64 bytes from 1.1.1.1: icmp_seq=19 ttl=128 time=0.384 ms
64 bytes from 1.1.1.1: icmp_seq=20 ttl=128 time=2.00 ms
64 bytes from 1.1.1.1: icmp_seq=21 ttl=128 time=0.464 ms
64 bytes from 1.1.1.1: icmp_seq=22 ttl=128 time=2.47 ms

--- 1.1.1.1 ping statistics ---
22 packets transmitted, 22 received, 0% packet loss, time 23032ms
rtt min/avg/max/mdev = 0.325/1.539/14.055/2.834 ms

Ping server 1.1.1.1 using check_ping from Nagios:- Showing packet lost when run below command first time.
# ./check_ping -H 1.1.1.1 -w 30000,1% -c 100000,2%
PING CRITICAL - Packet loss = 16%, RTA = 2.26 ms|rta=2.257000ms;30000.000000;100000.000000;0.000000 pl=16%;1;2;0

# ./check_ping -H 1.1.1.1 -w 30000,1% -c 100000,2%
PING OK - Packet loss = 0%, RTA = 1.16 ms|rta=1.163000ms;30000.000000;100000.000000;0.000000 pl=0%;1;2;0
Thanks,
Imran Khan.

Mode Note: Merged your two posts, please edit your previous post if you are the last poster instead of replying to yourself. Also be sure to wrap your code/cli output in CODE tags.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: False packet lost alert from Nagios

Post by tmcdonald »

Is 1.1.1.1 the actual host you are checking with Nagios? If not then testing against it on the command line will not be an appropriate way to test real packet loss.

Also, are you sure you have the right warning and critical values? 1% and 2% packet loss for only 5 pings sent out are essentially the same thing, as a single lost reply will be 20% already.
Former Nagios employee
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: False packet lost alert from Nagios

Post by eloyd »

tmcdonald wrote:Also, are you sure you have the right warning and critical values? 1% and 2% packet loss for only 5 pings sent out are essentially the same thing, as a single lost reply will be 20% already.
100% agree. Packet loss via ping is often not really packet loss but just bad pings. 5 pings is not a lot of pings and will result in 20% packet loss for each missed ping.

A better way to deal with percentage packet loss is to send 100 pings. That way, one lost ping = 1% lost in packets. Then you set your warn/critical values in percentages and don't have to worry about what they "really are" versus what they "appear to be."

So change -p 5 to -p 100 and then your -w and -c values will be actual percentages lost.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: False packet lost alert from Nagios

Post by tmcdonald »

eloyd wrote:A better way to deal with percentage packet loss is to send 100 pings. That way, one lost ping = 1% lost in packets. Then you set your warn/critical values in percentages and don't have to worry about what they "really are" versus what they "appear to be."

So change -p 5 to -p 100 and then your -w and -c values will be actual percentages lost.
Gonna have to disagree with you there, and here's why:

Code: Select all

root@nagiosxi: /usr/local/nagios/libexec
$ time ./check_ping -H 8.8.8.8 -w 200,60% -c 500,80% -p 5
PING OK - Packet loss = 0%, RTA = 34.57 ms|rta=34.567001ms;200.000000;500.000000;0.000000 pl=0%;60;80;0

real    0m4.042s
user    0m0.001s
sys     0m0.001s

root@nagiosxi: /usr/local/nagios/libexec
$ time ./check_ping -H 8.8.8.8 -w 200,60% -c 500,80% -p 100
PING OK - Packet loss = 0%, RTA = 36.85 ms|rta=36.849998ms;200.000000;500.000000;0.000000 pl=0%;60;80;0

real    1m39.185s
user    0m0.000s
sys     0m0.013s
Having a single check run for over a minute and a half is a recipe for disaster. I would just stick with the 5 pings and use multiples of 20 for percentages.
Former Nagios employee
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: False packet lost alert from Nagios

Post by eloyd »

Good point @tmcdonald. But this is why we don't use ping to determine packet loss for our VoIP networks. I think packet loss as demonstrated by ping can be misleading, since the protocol that you're using at OSI Layer 3 and above can sometimes mitigate packet loss problems so that the application doesn't even see dropped packets.

But you're still right about the -p 100. ;-)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: False packet lost alert from Nagios

Post by tmcdonald »

imran_khan, can you clarify the questions I had?
Former Nagios employee
imran_khan
Posts: 196
Joined: Mon May 27, 2013 3:16 pm

Re: False packet lost alert from Nagios

Post by imran_khan »

Hello tmcdonald,

IP:- 1.1.1.1 is not real IP. Real IP is my client server IP, I am trying to ping this IP from Nagios server using ping and check_ping.
I am not sure warning-1% and criticatl-2% are correct or not. Please suggest me on this issue.

Thanks,
Imran Khan.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: False packet lost alert from Nagios

Post by slansing »

What they were telling you is that you had your thresholds unreasonably low, especially for a ping over the wire, publicly, through the internet. I would bump them up a bit to match both what the live response generally is, and how many pings you are running. As tmcdonald mentioned... after your initial post, with the standard 5 pings, if you even lose one response you would hit 20%, there is no way you could get as granular as 1% or 2% with the standard 5 pings.
imran_khan
Posts: 196
Joined: Mon May 27, 2013 3:16 pm

Re: False packet lost alert from Nagios

Post by imran_khan »

Hello,

Sorry I am not getting you.

I am not sure warning-1% and criticatl-2% are correct or not. Please suggest me on this issue to avoid false packet lost alert from nagios.

Thanks,
Imran Khan.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: False packet lost alert from Nagios

Post by tmcdonald »

You are probably going to see some packet loss in any network, so with 5 pings being sent a good value for warning might be 40% and for critical might be 80%.
Former Nagios employee
Locked