Strange Problem: RTA = 33726.83ms (Packet Loss = 0%)

silverbenz · Post by **silverbenz** » Sun Dec 02, 2012 5:22 pm

Hi,

I have a Nagios Core instance (3.2.3) which periodically shows very unusual results against a small number of hosts. An example:

HOST: Status Information: PING CRITICAL - Packet loss = 0%, RTA = 33726.83 ms

Checking the log files shows that the RTA didn't appear to increment gradually, it just went straight to the number above.

The Host is up, but Nagios has marked it as down but not sent any notifications. All of the Services on the host are up, including "ping" which is showing an RTA of 142.79 right now. This only happens on a small number of hosts. When this starts happening Nagios stops checking the host as well - Last Check Time for the host from the example above was two days ago. Forcing Nagios to do a check via the "Reschedule the next check..." link fixes the issue and starts the checks going again for this host.

Anyone have any idea what might be causing this issue and how I might be able to fix it?

Cheers,
Ben.

PS> I have another Nagios instance at another site monitoring the same devices and it never has this problem. They are the same code release (3.2.3).