I am concerned with the way Nagios appears to handle passive alerts. As I
mentioned before, I am using a script to monitor a system farm of several
hundred machines. Every five minutes this script submits passive checks for
each machine into Nagios.
Doing the above I frequently see many (for large values of many, sometimes >
100) of Nagios processes that are blocked on a lock file in the var directory.
It looks like this is due to the process that is reading the passive checks
from the named pipe. However, this has frequently led to system loads over
100, and this morning brought the system to a griding halt.
Does anyone have any idea why the passive checks are causing this problem? If
I stop the cron job that generates the checks and restart Nagios the load goes
away and doesn't return. By whole point in doing this in the first place with
passive checks was to avoid the load on the system caused by hundreds of
processes having to run every few minutes, but that seems to have backfired.
--
Dan Rich | http://www.employees.org/~drich/
| "Step up to red alert!" "Are you sure, sir?
| It means changing the bulb in the sign..."
| - Red Dwarf (BBC)
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]