Nagios XI 2014 Migrating Off Mod Gearman
Posted: Wed Jun 18, 2014 4:11 pm
I recently completed the migration of 28 servers to Nagios 2014R1.1. After experiencing the performance improvements, I debated over whether to move off of Mod Gearman and finally decided to move in that direction after some minor display problems within the Thruk browser caused by the modifications required with Mod Gearman on the Nagios 2014R1.1 release. I actually believe the NagiosXI 2014 release application runs better without Mod Gearman once it is proper tuned. Since it took several iterations of tuning to stabilize some of my larger server configuration, I thought I would share my experience to others in an effort to save others some time.
Our main Nagios server configuration used Mod Gearman with 3 remote workers.
Nagios XI Version : 2014R1.1
nagprod01.cellnet.com 2.6.32-279.11.1.el6.x86_64 x86_64
CentOS release 6.3 (Final)
nagios (pid 7793) is running...
NPCD running (pid 23335).
ndo2db (pid 2568) is running...
CPU Load 15: 3.09
Total Hosts: 1907
Total Services: 9263
8 Core CPU with 16GB memory
To remove Mod Gearmand, the following modification were implemented:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=6 – The default 4 workers would not keep up
commented out the 2 “embedded_perl” entries
max_host_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
use_retained_program_state=1
use_retained_scheduling_info=1
commented out broker_module= -for Mod Gearman
service restart nagios
service gearmand stop
service mod_gearman_worker stop
chkconfig gearmand off -Prevent start of process during startups
chkconfig mod_gearman_worker off -Prevent start of process during startups
__________________________________________
Modifications to /usr/local/nagios/etc/pnp/npcd.cfg
load_threshold = 80.0 (Using 10 times # of CPU cores)
Removing Mod Gearman increases the system load and NPCD will shut down if this threshold is exceeded.
__________________________________________
Modification to /usr/local/nagios/etc/pnp/process_perfdata.cfg
TIMEOUT = 30 -Prevent timeouts while collecting perfdata under increased system load
__________________________________________
Once all modifications are made, a system restart helps insure a good clean start and stable run after the modifications.
shutdown –r 0 -Restart the server with a clean Non Mod Gearman configuration
Current status of server after removal of Mod Gearman On smaller server configurations with 2 to 4 core CPUs and 2 to 4 GB memory, I needed to throttle down the default Nagios 2014 application to keep from overloading the server. Modify the above changes as follows for the smaller servers:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=1 – The default 4 workers would overload the server
max_host_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
I also experimented with the nagios.cfg setting max_concurrent_checks=? But the modifications appeared to have no effect. Comments on this setting are welcome.
Our main Nagios server configuration used Mod Gearman with 3 remote workers.
Nagios XI Version : 2014R1.1
nagprod01.cellnet.com 2.6.32-279.11.1.el6.x86_64 x86_64
CentOS release 6.3 (Final)
nagios (pid 7793) is running...
NPCD running (pid 23335).
ndo2db (pid 2568) is running...
CPU Load 15: 3.09
Total Hosts: 1907
Total Services: 9263
8 Core CPU with 16GB memory
To remove Mod Gearmand, the following modification were implemented:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=6 – The default 4 workers would not keep up
commented out the 2 “embedded_perl” entries
max_host_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=60 - Required to provide a little more time to ramp up from a cold start.
use_retained_program_state=1
use_retained_scheduling_info=1
commented out broker_module= -for Mod Gearman
service restart nagios
service gearmand stop
service mod_gearman_worker stop
chkconfig gearmand off -Prevent start of process during startups
chkconfig mod_gearman_worker off -Prevent start of process during startups
__________________________________________
Modifications to /usr/local/nagios/etc/pnp/npcd.cfg
load_threshold = 80.0 (Using 10 times # of CPU cores)
Removing Mod Gearman increases the system load and NPCD will shut down if this threshold is exceeded.
__________________________________________
Modification to /usr/local/nagios/etc/pnp/process_perfdata.cfg
TIMEOUT = 30 -Prevent timeouts while collecting perfdata under increased system load
__________________________________________
Once all modifications are made, a system restart helps insure a good clean start and stable run after the modifications.
shutdown –r 0 -Restart the server with a clean Non Mod Gearman configuration
Current status of server after removal of Mod Gearman On smaller server configurations with 2 to 4 core CPUs and 2 to 4 GB memory, I needed to throttle down the default Nagios 2014 application to keep from overloading the server. Modify the above changes as follows for the smaller servers:
Modification to /usr/local/nagios/etc/nagios.cfg
check_workers=1 – The default 4 workers would overload the server
max_host_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
max_service_check_spread=120 - Required to provide a little more time to ramp up from a cold start.
I also experimented with the nagios.cfg setting max_concurrent_checks=? But the modifications appeared to have no effect. Comments on this setting are welcome.