Re: OCSP problem after upgrade to Nagios 4.0 (NSCA)
Posted: Fri Nov 08, 2013 11:16 am
The master system is monitoring about 3000 services - and these are being reported on every 10 minutes or so... so I'd guestimate about 1500 / 5m
Each of the monitoring servers (7 in total at present) reports some of these... but some are reporting about 100, and others nearer to 900.
I've currently got three master servers... and each monitoring server reports to each of these... using a customised version of the script; supplied with nagios source distribution.
The reason I believe it is a local resource issue, is that I'm timing each send_nsca command to each host... and recording these in a log file...
Most of the time, it is only one host which is having delays.... and I have managed to stop the errors by halving the number of NSCA messages being sent to the "problem" host...
Obviously, halving the number of messages means I don't get a whole picture... so its not a real solution.
The investigation continues....
Malcolm
Each of the monitoring servers (7 in total at present) reports some of these... but some are reporting about 100, and others nearer to 900.
I've currently got three master servers... and each monitoring server reports to each of these... using a customised version of the script;
Code: Select all
contrib/eventhandlers/distributed-monitoring/submit_check_result_via_nscaThe reason I believe it is a local resource issue, is that I'm timing each send_nsca command to each host... and recording these in a log file...
Most of the time, it is only one host which is having delays.... and I have managed to stop the errors by halving the number of NSCA messages being sent to the "problem" host...
Obviously, halving the number of messages means I don't get a whole picture... so its not a real solution.
The investigation continues....
Malcolm