Page 1 of 3
Monitoring Engine Event Queue anomaly?
Posted: Wed Jun 05, 2013 7:00 pm
by Box293
Was there an answer to this?
Our events get up to about 4000 every hour and then drop to about 300, around the time of the hourly DB scripts run from what I understand.
CPU usage on the XI host goes up when this happens. This screenshot shows the past day CPU usage for this XI host, you can see the hourly job that occurs.
CPU 1 day summary.png
We only have active serivce checks, no passive.
Nagios XI 2012R1.7 VM running on ESXi 5.1.
Re: Monitoring Engine Event Queue anomaly?
Posted: Thu Jun 06, 2013 12:48 pm
by lmiltchev
Was there an answer to this?
...
We only have active serivce checks, no passive.
We haven't received a response from the customer since 04/21/2013, so I am not sure if this is resolved. His case was different though. The majority of the checks were passive checks, sent at similar times.
Re: Monitoring Engine Event Queue anomaly?
Posted: Fri Jun 07, 2013 12:22 am
by Box293
I'll put some notes together about this and come back to you with some helpful information.
I have a suggestion (that requires some explaining first).
We use the
Nagios XI Server Monitoring Wizard on a test server to monitor our production server. This lets us know if the production server goes down or there is something wrong (it's really handy). In particular I am talking about the "Nagios XI Jobs" service.
Normally the status is "All jobs are running ok" however a condition can occur when the scheduled events over time build up WHILE a backup of the VM is occurring and in turn causes the Nagios XI Jobs service to report "Nagios XI Jobs;UNKNOWN;SOFT;3;Database Maintenance (dbmaint) stale (1203 seconds old), Database Maintenance (dbmaint) stale (1203 seconds old)".
My suggestion is that you incorporate performance data into this service so we can observe over time how long the database maintenance jobs take to run. OR define a new service in the wizard that tracks the database maintenance durations so we can observe this in pretty graphs.
I hope this makes sense.
Re: Monitoring Engine Event Queue anomaly?
Posted: Fri Jun 07, 2013 10:10 am
by lmiltchev
It makes sense and I think it's a good idea. Please, post a feature request on our
bug tracker, so that it won't "fall in the cracks".

Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 4:15 pm
by Box293
OK so here's some more information.
As quoted from another post:
Here is the log file with the output from these commands.
putty-2013-06-12.log
Also here is a better screenshot of the problem when it occurs.
Monitoring Engine Status.png
One other observation is that when the shceduled events over time build up like this, things like dashlets do not display properly. It's almost like the dashlet cannot access the database to get the service object (or something like that).
I'll post back here in a couple of hours with an update, to report back if the problem has been resolved since running these two repair / cleanup procedures.
Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 4:31 pm
by Box293
Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 4:49 pm
by lmiltchev
Thanks for the post, Troy!
Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 4:57 pm
by Box293
OK so the two repair / cleanup procedures did not solve the problem.
The database maintenance job just ran and it did the same thing, scheduled events over time went up to 4000+ as the job ran over a 20 minute window.
Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 6:31 pm
by scottwilkerson
Troy,
There were some issues introduced in 2012r1.7 that could be causing the issue you are seeing. It affected how we query objects in the DB, and caused us to release 2012r1.8 shortly after.
I also like the feature request, when I get some time I try to get that in there...
Re: Monitoring Engine Event Queue anomaly?
Posted: Tue Jun 11, 2013 10:16 pm
by Box293
Thanks for that Scott.
A couple of hours ago I upgraded to 2012R2.2 however the problem has not been resolved.
The database maintenance job has run twice since and nothing seems to have changed, scheduled events over time went up to 4000+ as the job ran over a 20 minute window.