Nagios as cause of ping or service check failure/timeout
Posted: Wed Jul 11, 2018 5:18 pm
Hi. This is an odd question. We are using ping for host checks and use multiple checks that access the remote server using specific TCP ports.
I want to know if I can rule out the Nagios server as a likely cause of a ping failure or a check timeout. I've checked the server load and iowait stats for the relevant time periods. In each case, they are within normal limits (iowait around 0.27% and 1-minute load less than 3.)
Is there another place to look? Is there a way to examine a "check processing queue" length relative to the times that Nagios declares a specific check has timed out/failed? Or some other way to give weight to the idea that some checks are marked as failed because Nagios did not process it in time?
My initial thought is that a timeout has to be caused by the network or the remote server. That would include the agent on the server as "the remote server."
Thanks!
I want to know if I can rule out the Nagios server as a likely cause of a ping failure or a check timeout. I've checked the server load and iowait stats for the relevant time periods. In each case, they are within normal limits (iowait around 0.27% and 1-minute load less than 3.)
Is there another place to look? Is there a way to examine a "check processing queue" length relative to the times that Nagios declares a specific check has timed out/failed? Or some other way to give weight to the idea that some checks are marked as failed because Nagios did not process it in time?
My initial thought is that a timeout has to be caused by the network or the remote server. That would include the agent on the server as "the remote server."
Thanks!