Monitoring Engine Event Queue bottlenecks occassionally

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
paul.jobb
Posts: 167
Joined: Tue Aug 02, 2011 4:37 pm

Re: Monitoring Engine Event Queue bottlenecks occassionally

Post by paul.jobb »

I'm not exactly sure of the veriosn of our esxi environment, I would guess 4.1 though. We also decided to install vm tools on our nagios servers and use the tools for clock synchronization in place of ntp.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Engine Event Queue bottlenecks occassionally

Post by scottwilkerson »

I have a feeling that using vmware tools for the time sync could be causing the issue. Would you be willing to change that to NTP?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Monitoring Engine Event Queue bottlenecks occassionally

Post by Box293 »

High five paul.jobb,
Problem appears to be solved.

I changed my database log entries and state history to be 182 days. Now the problem no longer occurrs. When I look at the VM performance, disk I/O and CPU usage has dropped dramatically when the hourly db optimization task occurred.

I'm thinking I might look at implementing some mysql monitoring checks for Nagios so I can get information like duration of db optimisation jobs into some nice pretty graphs.

FYI #1 The night before I tried removing one vCPU so I only had two. That just made things really bad. So instead I added another one so my total vCPU count was 4. This did not fix the problem, however it did mask the issue, scheduled events would only get up to about 1500. Also CPU ready time increased for this VM.

FYI #2 One side effect of this problem is that is was causing MRTG to lose data somehow and I ended up with gaps in my graphs. Since I made the changes last night, the graphs are looking great. Screenshot shows what I mean, the ones I am pointing to are from yesterday and the data on the right is from today.
Effects on MRTG.png
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
paul.jobb
Posts: 167
Joined: Tue Aug 02, 2011 4:37 pm

Re: Monitoring Engine Event Queue bottlenecks occassionally

Post by paul.jobb »

that's good to hear, that seemed to be similar behavior I was having with the db optimization process.

In regards to ntp, I was using ntp until recently. When we stepped down from 4 vcpu's to 2 vcpu's we also installed vmtools and disabled ntp and enabled vmtools time sync. It seems likely that my vm environment is under resourced at certain times, so the the thought was that the tools may be better at smoothing the clock during those times. A few weeks ago I saw this message in my nagios log file, we have had some issues with latent checks and monitoring stopping at certain points. All my checks are farmed off to gearman workers so its just a matter keeping them scheduled.

[1354423407] Warning: A system time change of 0d 0h 36m 15s (forwards in time) has been detected. Compensating...
Locked