Perfdata causing poller icon to go red - checks queueing

gt71027 · Post by **gt71027** » Thu Jun 02, 2011 3:50 am

Hi All;

Spec:
We have around 500 hosts / 9000 service checks.

ESXI Redhat 5.5|2.6.18 Kern|8G Mem|4x2.5Ghz Procs
Nagios3|Centreon|NDO1.4b7|MySQL 5.0.77|RRDtool|PHPmyadmin|RDBD|eAccelerator
Centstatus DB with HEAP engine|Centreon DB with Slave|RDBD for /usr/local/nagios

We are not graphing all checks but when we load the system up with all hosts and turn
on 'PERF' then polling turns red, nagios latency builds up and checks start to delay. Turn it off and all is fine... If we leave 'PERF' on and reduce host count by half then it lives happily.

We are using RRD so not MYSQL for graphing data, but we have placed centstatus in 'MEM' to tune. Tried turning of the host-perfdata (plugin) but we are not graphing this
so no change there, it just writes to a flat file. Only using RRD on service-perfdata.

Basically is the only way forward to introduce a distributed architecture (multiple pollers) or could we tune out more from what we have.. Bottleneck does seem to be NDO (broker) if we examine the behavior....

Another option we have thought of is to disable 'SSL' in NRPE for the checks.......our network is secure.

Any thoughts or success stories of a design to this capacity would be much appreciated.
More data can be provided on request, just ask...

Gary....

mguthrie · Post by **mguthrie** » Thu Jun 02, 2011 9:30 am

Just and FYI, you're probably looking for the Nagios Core Forum, not the Nagios XI forum.

This is definitely an issue of overtaxing the CPU when the performance data begins to be processed. I might recommend making a few adjustments to see if this can get you what you need on a single system.

Increase the frequency of check results to be processed in the main nagios.cfg.

Code: Select all

check_result_reaper_frequency=3
max_check_result_reaper_time=10

Also, if you can afford a wider check interval, I would try increasing your average check_interval for all of your services. Every CPU has a maximum amount of checks per second it will be able to run. You'll know when you hit it because your check latency starts spiking.

gt71027 · Post by **gt71027** » Fri Jun 03, 2011 1:57 am

thanks, I'll try these tweaks, we did not have the max_check_result_reaper_time=10 in the nagios.cfg, could not
find in centreon front end either... interval_length is =60 at the moment.. if this is the field you described..

mguthrie · Post by **mguthrie** » Fri Jun 03, 2011 10:48 am

Do not change the interval_length from 60. That means 60 seconds, and all of your other intervals will be changed if you change this value.

If you're using Centreon, there may be some differences that we're unfamiliar with, you may need to post to their support for a clearer answer.

Also, I'm going to move this post to the Nagios Core Forum.

Nagios Support Forum

Perfdata causing poller icon to go red - checks queueing

Perfdata causing poller icon to go red - checks queueing

Re: Perfdata causing poller icon to go red - checks queueing

Re: Perfdata causing poller icon to go red - checks queueing

Re: Perfdata causing poller icon to go red - checks queueing