Page 1 of 1
Monitoring Engine Queue and Weird Performance issue since
Posted: Thu Jun 06, 2013 3:42 pm
by arnab.roy
HI Guys,
Ever since the upgrade to core 3.5 the load avg on the boxes has gone bonkers, we are seeing huge spikes of events suddenly growing and then dropping of see attached. Is this normal as we are hitting load avg of around 65 to 70 at times never seen that in my life. This is without adding anything on the box.
Do you guys have any idea if I can do something to fix this issue.
Re: Monitoring Engine Queue and Weird Performance issue sinc
Posted: Thu Jun 06, 2013 4:34 pm
by abrist
It looks like snmpwalk is the culprit. Have you created a check that uses snmpwalk?
Do you have many snmpwalk wizards open?
Do you know why there are so many snmpwalk processes running?
If your answer is "no" to all of the above questions, try:
Re: Monitoring Engine Queue and Weird Performance issue sinc
Posted: Thu Jun 06, 2013 4:45 pm
by arnab.roy
Hi Andy,
That's not the problem as we have some checks which uses snmpwalk and reads data from multiple oids. What I am seeing is a huge number of checks running at the same time and not getting distributed properly.
Please see the event queue ..screenshot. It doesn't look healthy to me.
Re: Monitoring Engine Queue and Weird Performance issue sinc
Posted: Thu Jun 06, 2013 4:55 pm
by abrist
arnab.roy wrote:What I am seeing is a huge number of checks running at the same time and not getting distributed properly.
The bars are not the best measure of check distribution, but with that number of walks, Nagios will have a hard time figuring out how to schedule those walks.
How long does each walk take to perform?
Are these full tree walks?
Could these walks be performed by some less intensive like snmpget?
Re: Monitoring Engine Queue and Weird Performance issue sinc
Posted: Thu Jun 06, 2013 4:57 pm
by scottwilkerson
Unfortunately that screenshot cuts off some important information, what are the service execution time numbers?