there seems to be no way of getting nagios-1.2 to process passive =
service checks faster than once per second. In our tests, setting=20
command_check_interval=3D-1
produces far worse results than setting=20
command_check_interval=3D1s
Problem:
If the above is generally true, given an average command length put into =
the external command file of 120 characters, and a FIFO length of 4096 =
characters, it's impossible for nagios-1.2 to handle more than 35 =
passive checks a second. We want our passive-only nagios-1.2 hosts to =
process at least 100 checks a second, and we don't see any other reasons =
why a nagios host doing nothing but receiving and forwarding checks =
can't do that.
Possible hack:=20
Patching the nagios-1.2 source to use microseconds since epoch instead =
of seconds since epoch when scheduling checks (if doing so for external =
commands, everything else has to tag along, I think). Accompanied by a =
'm' for milliseconds option, we can specify e.g =
command_check_interval=3D100m to get nagios to check its external =
command file 10 times every second. We need to modify the sources to use =
gettimeofday() instead of time(), a redefinition of the TIMED_EVENT =
struct, and dividing/multiplying things the other places TIMED_EVENT is =
used. Do you think this is a sound way of handling this? If anyone's =
done this already (or anything else that significantly speeds up the =
checks of external commands), could you mail the patches to the list, =
please?=20
According to rumour, nagios-2.0 keeps the FIFO empty. How's this done in =
2.0? Has anyone performance tested nagios-2.0 external command handling? =
Any chance of a backport to 1.2, Ethan?
Test description:
Locally on the Nagios server, simply echo'ing 5000 lines of real nsca =
data to the external command file, one "echo $some_check_result > =
nagios.cmd" per check, as fast as possible, and measuring how long it =
takes.
If nagios is checking "as often as possible" =
(command_check_interval=3D-1) the results are varying a lot, best values =
are 15 checks / second. Worst results took so long I couldn't be =
bothered waiting. If checking once per second, we get results very close =
to the theoretical maximum of approx. 33 checks per second given FIFO =
length of 4096, since our average # characters per check put into the =
FIFO is 123.=20
One problem with command_check_interval=3D-1 seems to be that nagios =
won't re-read the FIFO until it has finished processing all checks from =
the previous read. Note that we used (and reused) the same =
couple-of-days old data for these tests. Since I'm not sure exactly how =
nagios-1.2 computes when to schedule the next command-file-read with =
command_check_interval=3D-1, I don't know wether this invalidates this =
part of the test results.
We tested on a compaq dl360 P3 1133MHz, 256MB memory, running on Debian =
GNU/linux-2.4.25, and nagios-1.2. If need be, we'll use better hardware =
in production, but both load and memory use were low during these tests, =
so the hardware should not matter for the test results. We have no web =
server running on the nagios host we were testing.
Thank you in advance for all input, and have a good week-end!
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]