Re: Load Spikes on 7 Hour Intervals
Posted: Thu Dec 01, 2016 5:58 pm
During these restarts, does the information show up in Core?
Support for Nagios products and services
https://support.nagios.com/forum/
Considered that, but we try to avoid introducing new dependencies into our production monitoring environment as much as possible. Offloading the DB would introduce additional network and storage dependencies. If we were to offload the DB, could you provide guidance on recommended specs for a DB host for a monitoring environment of our size (2,038 hosts, 14,740 services, 14,000 active checks every 5 minutes)?avandemore wrote:have you considered offloading the db?
I have made the following adjustments:avandemore wrote:Also you could consider setting the values in XI > Admin > Performance Settings > Database > NDOUtils to the minimum useful setting.
Code: Select all
Nagios XI Database
Reduced Optimize Interval: from 60 to 240
NDOUtils Database
Reduced Max External Commands Age: from 7 to 1
Reduced Max Log Entries Age: from 15 to 3
Reduced Max Notifications Age: from 15 to 3
Reduced Optimize Interval: from 60 to 240
NagiosQL Database
Reduced Max Logbook Age: from 2880 to 480
Reduced Optimize Interval: from 60 to 240
We require realtime data so this is not an option.avandemore wrote:Enabling XI > Admin > Performance Settings > Backend Cache may also help, but be sure to understand the ramifications of turning that on. It could easily be unsuitable for your environment.
12avandemore wrote:Also how many cores are on this system? lscpu |grep -i socket multiply the values.
Code: Select all
lscpu |grep -i socket
Core(s) per socket: 6
Socket(s): 2
I increased the load_threshold to 48 as recommended and restarted the npcd service.avandemore wrote:The recommend value for load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg is 4 * the above value.
Correct. However, I did review it and made the following additional changes:avandemore wrote:Also I assume you've got through this document in the past? https://assets.nagios.com/downloads/nag ... ios-XI.pdf
We don't ever use the Core UI to monitor, but I just applied a config in CCM while watching Core to test. Here are the results:avandemore wrote:During these restarts, does the information show up in Core?
This info is just for systems that do a LARGE amount of checks. This behavior makes sense. Nagios Core itself isn't dependent on the DB while XI and many of its components are.drcentner wrote:We don't ever use the Core UI to monitor, but I just applied a config in CCM while watching Core to test. Here are the results:
Time from receiving the "Configuration applied successfully" message until the Operations Center component stopped displaying alerts for hosts/services that were acknowledged/downtimed: 3 minutes, 10 seconds
The Nagios Core "Host Information" and "Host Status Details For All Host Groups" web pages accurately reflected acknowledgements and downtimes for the entire duration of the 3:10.