Re: [Nagios-devel] Nagios 2.6 still not draining command pipe fast

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Nagios 2.6 still not draining command pipe fast

Post by Guest »


In message ,
Ethan Galstad writes:

>John P. Rouillard wrote:
>> In message ,
>> Ethan Galstad writes:
>>
>>> John P. Rouillard wrote:
>>>> Hi all:
>>>>
>>>> I am trying to get my external correlation engine working with nagios
>>>> 2.x , and I just can't get
>>>> nagios to drain the command pipe fast enough. I see approx. 5% failure
>>>> rate on writing to the command pipe with an EAGAIN error.
>>>>
>>>> I have increased:
>>>>
>>>> nagios.h:#define COMMAND_BUFFER_SLOTS 20480
>>>> nagios.h:#define SERVICE_BUFFER_SLOTS 20480
>>>>
>>>> from the original 1024. In the increase of the settings from 10240 to
>>>> 20480, I may see a slight decrease (maybe .5%), but I think I just want
>to
>>>> see it. I don't think it's statistically viable.
>>> John - Does this problem still occur with Nagios 2.7 or the latest 2.x
>>> CVS code? A separate command file worker thread should be reading
>>> entries from the external command file as fast as it can read them (as
>>> long as their are free buffer slots).
>>>
>>> If there aren't any external commands, the thread waits 0.5 seconds
>>> before checking for new commands in the file. If you have occasional
>>> bursts of check results, this could be too long to wait. You could try
>>> experimenting with decreasing the 0.5 second delay. Around line 4948 of
>>> base/utils.c, you'll find...
>>>
>>> /* wait a bit */
>>> tv.tv_sec=0;
>>> tv.tv_usec=500000;
>>> select(0,NULL,NULL,NULL,&tv);
>>>
>>> You could try decreasing the value of tv.tv_usec to 100000 (0.1 seconds)
>>> and see if that helps at all.

I installed Nagios 2.7 last Thursday. Now the occurrence has dropped
from 5% to something in the neighborhood of .7%. But that may not be
the stable point as it is still growing, it was .5% a couple of days
ago. I haven't tried changing the sleep times mentioned above because
of a dramatic increase in average latency.

I am now seeing average latency in the 20 second range rather than 1
second as was occurring with my nagios 2.6 install. What is funny is
that the gui is showing:

Check Latency: 0.00 sec 109.37 sec 34.685 sec

that doesn't agree with what nagiostats reports. The max latency is
understandable as we have been having some network drops, but even in
a freshly started nagios with no network issues, the latency is in the
same range after a couple of hours. A 5 day old nagios process was
reporting the following from nagiostats:

Nagios Stats 2.7
Copyright (c) 2003-2007 Ethan Galstad (www.nagios.org)
Last Modified: 01-19-2007
License: GPL

CURRENT STATUS DATA
----------------------------------------------------
Status File: /var/log/nagios/status.dat
Status File Age: 0d 0h 0m 1s
Status File Version: 2.7

Program Running Time: 5d 21h 28m 58s
Nagios PID: 29914
Used/High/Total Command Buffers: 0 / 45 / 4096
Used/High/Total Check Result Buffers: 96 / 441 / 4096

Total Services: 1876
Services Checked: 1696
Services Scheduled: 1627
Active Service Checks: 1692
Passive Service Checks: 184
Total Service State Change: 0.000 / 73.420 / 2.913 %
Active Service Latency: 0.000 / 90.954 / 19.948 sec
Active Service Execution Time: 0.000 / 55.244 / 4.032 sec
Active Service State Change: 0.000 / 73.420 / 3.188 %
Active Services Last 1/5/15/60 min: 870 / 1353 / 1414 / 1450
Passive Service State Change: 0.000 / 16.780 / 0.381 %
Passive Services Last 1/5/15/60 min: 123 / 175 / 176 / 177
Services Ok/Warn/Unk/Crit: 1400 / 24 / 274 / 178
Services Flapping: 0
Services In Downtime: 0

Total Hosts: 118
Hosts Checked: 118
Hosts Scheduled: 0
Active Host Checks: 118
Passive Host Checks: 0
Tota

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked