Our main Nagios server configuration used Mod Gearman with 3 remote workers.
Nagios XI Version : 2014R1.1
nagprod01.cellnet.com 2.6.32-279.11.1.el6.x86_64 x86_64
CentOS release 6.3 (Final)
nagios (pid 7793) is running...
NPCD running (pid 23335).
ndo2db (pid 2568) is running...
CPU Load 15: 3.09
Total Hosts: 1907
Total Services: 9263
8 Core CPU with 16GB memory
To remove Mod Gearmand, the following modification were implemented:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=6 – The default 4 workers would not keep up
commented out the 2 “embedded_perl” entries
max_host_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
use_retained_program_state=1
use_retained_scheduling_info=1
commented out broker_module= -for Mod Gearman
service restart nagios
service gearmand stop
service mod_gearman_worker stop
chkconfig gearmand off -Prevent start of process during startups
chkconfig mod_gearman_worker off -Prevent start of process during startups
__________________________________________
Modifications to /usr/local/nagios/etc/pnp/npcd.cfg
load_threshold = 80.0 (Using 10 times # of CPU cores)
Removing Mod Gearman increases the system load and NPCD will shut down if this threshold is exceeded.
__________________________________________
Modification to /usr/local/nagios/etc/pnp/process_perfdata.cfg
TIMEOUT = 30 -Prevent timeouts while collecting perfdata under increased system load
__________________________________________
Once all modifications are made, a system restart helps insure a good clean start and stable run after the modifications.
shutdown –r 0 -Restart the server with a clean Non Mod Gearman configuration
Current status of server after removal of Mod Gearman On smaller server configurations with 2 to 4 core CPUs and 2 to 4 GB memory, I needed to throttle down the default Nagios 2014 application to keep from overloading the server. Modify the above changes as follows for the smaller servers:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=1 – The default 4 workers would overload the server
max_host_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
I also experimented with the nagios.cfg setting max_concurrent_checks=? But the modifications appeared to have no effect. Comments on this setting are welcome.