Each of the monitoring servers (7 in total at present) reports some of these... but some are reporting about 100, and others nearer to 900.
I've currently got three master servers... and each monitoring server reports to each of these... using a customised version of the script;
Code: Select all
contrib/eventhandlers/distributed-monitoring/submit_check_result_via_nscaThe reason I believe it is a local resource issue, is that I'm timing each send_nsca command to each host... and recording these in a log file...
Most of the time, it is only one host which is having delays.... and I have managed to stop the errors by halving the number of NSCA messages being sent to the "problem" host...
Obviously, halving the number of messages means I don't get a whole picture... so its not a real solution.
The investigation continues....
Malcolm