Too Few Checks Running Per Minute
Posted: Fri Dec 23, 2011 3:15 pm
Hello everybody.
I have Nagios Core, 3.3.1. Running on CentOS 5.7, 64-bit. 8 cores (Intel(R) Xeon(R) CPU E5420 @ 2.50GHz), 8G of RAM. Non-distributed, everything on one box.
The system is working well enough, but I know it can run faster. We're monitoring quite a few hosts here (over 6000). The system's load average is hovering around 2. The CPU usage is hovering around 12% (overall) according to top. Currently, it takes about 6-8 minutes for all of the hosts to be checked.
Looking at the specs and workload of the machine, I know that time can be cut in, at least, half. I've been reading over the documentation on Nagios, reading through the configuration, googling for hints, but I'm still stuck at this point.
Currently, we've switched host and service checking to use dumb scheduling (inter_check_delay_method=n), set max concurrent processes to 0, made sure the host and service check_intervals are 2 minutes, set reaper frequency to 2, and Nagios is still refusing to use this machine to its potential. It's not even coming close.
I've gone through the performance tuning guide as well.
At this point, I'm stumped. Does anybody have any advice or ideas on things I could be missing that would cause Nagios to simply not run more checks per second? Or is there more information that is needed?
I have Nagios Core, 3.3.1. Running on CentOS 5.7, 64-bit. 8 cores (Intel(R) Xeon(R) CPU E5420 @ 2.50GHz), 8G of RAM. Non-distributed, everything on one box.
The system is working well enough, but I know it can run faster. We're monitoring quite a few hosts here (over 6000). The system's load average is hovering around 2. The CPU usage is hovering around 12% (overall) according to top. Currently, it takes about 6-8 minutes for all of the hosts to be checked.
Looking at the specs and workload of the machine, I know that time can be cut in, at least, half. I've been reading over the documentation on Nagios, reading through the configuration, googling for hints, but I'm still stuck at this point.
Currently, we've switched host and service checking to use dumb scheduling (inter_check_delay_method=n), set max concurrent processes to 0, made sure the host and service check_intervals are 2 minutes, set reaper frequency to 2, and Nagios is still refusing to use this machine to its potential. It's not even coming close.
I've gone through the performance tuning guide as well.
At this point, I'm stumped. Does anybody have any advice or ideas on things I could be missing that would cause Nagios to simply not run more checks per second? Or is there more information that is needed?