This is a multi-part message in MIME format.
--------------080704040200080709010707
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Ton Voon wrote:
> This is the test case:
> * set max_concurrent_checks=1 in nagios.cfg
> * create a host with 3 services with a check_interval of 1 minute
> * restart nagios
> * go to the host page and schedule a check for all services on the
> host (this makes all the services run at the same time)
> * tail nagios.log. Should see "Max concurrent service checks (1)
> has been reached"
> * on the host page, notice the last run time. Only one will be
> updated after 1 minute. All services get scheduled for the next time
> at the same time, and after the next minute, only one of those will
> have the last check time changed
>
yip exactly the behavior you describe. I setup a standalone machine
running the default checks against itself, and the queue shows them all
scheduled for the same time the next minute. also the log entries appear
as you describe.
> I've just committed a patch into CVS HEAD. This nudges the time ahead
> by 5 + random(10) seconds. I've also included a test case which
> ensures that the nudge factor is added in these cases.
>
> nagios.log will also have an entry which lists the affected service.
> If you get this message a lot on a regular system, then you need to
> consider increasing the max_concurrent_checks value.
>
> I'd be grateful if you could try this out.
>
with the patch, I see the check spread in the queue now, and all the
services are checked quicker than in the case without the patch, at
least this is what I noticed. there is one odd behavior, with the
default tests running, one check kept getting nudged, and as a result
wasn't run for a while. attached is the nagios.log, the first two
restarts are without the patch, and then with the patch. for the entire
duration I ran with the patch, the "current users" check had not been
run. am I doing something wrong in testing this though?
> Thinking some more, setting the next check time ahead doesn't really
> make sense, because the latency value does not reflect the fact that
> this active service's check time was delayed. Maybe this should be
> implemented as a remove of the event from the queue, and then re-added
> with a nudged event run time but the old service->next_check time.
>
> Anyhow, this should be better than it was.
agree about the latency, although it is logging the incident so users
should catch why their checks are running a little delayed. not sure
about the event queue and how it works yet, haven't looked at this part
of nagios.
--------------080704040200080709010707
Content-Type: text/x-log;
name="nagios.log"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nagios.log"
[1244883857] Nagios 3.1.0 starting... (PID=14070)
[1244883857] Local time is Sat Jun 13 05:04:17 EDT 2009
[1244883857] LOG VERSION: 2.0
[1244883857] Finished daemonizing... (New PID=14071)
[1244883908] EXTERNAL COMMAND: SCHEDULE_HOST_SVC_CHECKS;localhost;1244883908
[1244883909] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883909] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883909] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883909] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883910] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883910] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883910] Max concurrent service checks (1) has been reached. Delaying further checks until previous checks are complete...
[1244883910] Max concurrent service checks (1) has been reached. Delaying further checks until previous ch
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]