[Nagios-devel] QUERY: Obsessive-Compulsive Processors obsessing too much?
Posted: Fri Apr 28, 2006 12:02 am
One of the things that I ran into when tracking down my runaway process
issue, is the handling of the obsessive-compulsive options. In
particular, while the service checks are run in parallel, the ocsp_command
is run in series (along with event handlers and so forth, all by
reap_service_checks). To illustrate, a set of 5 services may well be run
at the same time, taking T(service_run_time), but then the
ocsp_commands are run one after another, taking T(N*ocsp_command_time) :
Parallel Service | | | | | T
Check Execution | | | | | ( run_service_checks() )
\ \ | / / I
Reaper \ \|/ /
Interval \ | / M
V ( reap_service_checks )
--- ( my_system ) E
Serial --- ( my_system )
Obsessive-Compulsive --- |
Execution --- V
---
In the setup that I am working on, I have Nagios running at a rate of at
least 1 service check per second, with an ocsp_command to distribute the
results to other machines. Thus, every reaper_interval (10 seconds),
Nagios hangs around for the ocsp_command to finish running for every
service check.
Since the oscp_command is dependent on TCP handshakes to complete, the
time it takes to finish is noticably variable, and thus Nagios continually
gets later and later.
My query is, do I shift my distributed monitoring to be more batched, and
run my distributed monitoring stuff off the periodic execution of
service_perfdata_file_processing_command, or do I change Nagios to run the
oscp_command in a double fork like run_service_check() ?
--==--
Bruce.
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
issue, is the handling of the obsessive-compulsive options. In
particular, while the service checks are run in parallel, the ocsp_command
is run in series (along with event handlers and so forth, all by
reap_service_checks). To illustrate, a set of 5 services may well be run
at the same time, taking T(service_run_time), but then the
ocsp_commands are run one after another, taking T(N*ocsp_command_time) :
Parallel Service | | | | | T
Check Execution | | | | | ( run_service_checks() )
\ \ | / / I
Reaper \ \|/ /
Interval \ | / M
V ( reap_service_checks )
--- ( my_system ) E
Serial --- ( my_system )
Obsessive-Compulsive --- |
Execution --- V
---
In the setup that I am working on, I have Nagios running at a rate of at
least 1 service check per second, with an ocsp_command to distribute the
results to other machines. Thus, every reaper_interval (10 seconds),
Nagios hangs around for the ocsp_command to finish running for every
service check.
Since the oscp_command is dependent on TCP handshakes to complete, the
time it takes to finish is noticably variable, and thus Nagios continually
gets later and later.
My query is, do I shift my distributed monitoring to be more batched, and
run my distributed monitoring stuff off the periodic execution of
service_perfdata_file_processing_command, or do I change Nagios to run the
oscp_command in a double fork like run_service_check() ?
--==--
Bruce.
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]