Perfdata causing poller icon to go red - checks queueing
Posted: Thu Jun 02, 2011 3:50 am
Hi All;
Spec:
We have around 500 hosts / 9000 service checks.
ESXI Redhat 5.5|2.6.18 Kern|8G Mem|4x2.5Ghz Procs
Nagios3|Centreon|NDO1.4b7|MySQL 5.0.77|RRDtool|PHPmyadmin|RDBD|eAccelerator
Centstatus DB with HEAP engine|Centreon DB with Slave|RDBD for /usr/local/nagios
We are not graphing all checks but when we load the system up with all hosts and turn
on 'PERF' then polling turns red, nagios latency builds up and checks start to delay. Turn it off and all is fine... If we leave 'PERF' on and reduce host count by half then it lives happily.
We are using RRD so not MYSQL for graphing data, but we have placed centstatus in 'MEM' to tune. Tried turning of the host-perfdata (plugin) but we are not graphing this
so no change there, it just writes to a flat file. Only using RRD on service-perfdata.
Basically is the only way forward to introduce a distributed architecture (multiple pollers) or could we tune out more from what we have.. Bottleneck does seem to be NDO (broker) if we examine the behavior....
Another option we have thought of is to disable 'SSL' in NRPE for the checks.......our network is secure.
Any thoughts or success stories of a design to this capacity would be much appreciated.
More data can be provided on request, just ask...
Gary....
Spec:
We have around 500 hosts / 9000 service checks.
ESXI Redhat 5.5|2.6.18 Kern|8G Mem|4x2.5Ghz Procs
Nagios3|Centreon|NDO1.4b7|MySQL 5.0.77|RRDtool|PHPmyadmin|RDBD|eAccelerator
Centstatus DB with HEAP engine|Centreon DB with Slave|RDBD for /usr/local/nagios
We are not graphing all checks but when we load the system up with all hosts and turn
on 'PERF' then polling turns red, nagios latency builds up and checks start to delay. Turn it off and all is fine... If we leave 'PERF' on and reduce host count by half then it lives happily.
We are using RRD so not MYSQL for graphing data, but we have placed centstatus in 'MEM' to tune. Tried turning of the host-perfdata (plugin) but we are not graphing this
so no change there, it just writes to a flat file. Only using RRD on service-perfdata.
Basically is the only way forward to introduce a distributed architecture (multiple pollers) or could we tune out more from what we have.. Bottleneck does seem to be NDO (broker) if we examine the behavior....
Another option we have thought of is to disable 'SSL' in NRPE for the checks.......our network is secure.
Any thoughts or success stories of a design to this capacity would be much appreciated.
More data can be provided on request, just ask...
Gary....