Page 3 of 3

Re: Monitoring Engine Event Queue anomaly?

Posted: Fri Jun 14, 2013 12:50 pm
by abrist
Sam is out until Monday. What commands/deletions did he ask you to run?

Re: Monitoring Engine Event Queue anomaly?

Posted: Fri Jun 14, 2013 5:31 pm
by Box293
They were

Code: Select all

tail -n 200 /usr/local/nagiosxi/var/dbmaint.log

ls -l /usr/local/nagios/var/spool/checkresults/
I've PM you the output from these commands.

The deletions were in relation to:
As per the checkresults folder, there was about 1400 files in this folder when I had a look. Interestingly there were 1070 files that were created in 2012, 2011 and 2010. When watching the folder, files created in 2013 were being processed correctly and dissapearing soon thereafter. So I've just deleted those 1070 files and I'm waiting for the database maintenance task to complete.

Re: Monitoring Engine Event Queue anomaly?

Posted: Mon Jun 17, 2013 9:09 am
by slansing
Whether your mysql server is offloaded or not, what it it's timezone set to? Please see this page for information on how to get this:

http://dev.mysql.com/doc/refman/5.0/en/ ... pport.html

Also, please share the output of the following:

Code: Select all

hwclock

Code: Select all

date

Re: Monitoring Engine Event Queue anomaly?

Posted: Mon Jun 17, 2013 5:21 pm
by Box293
Just as an FYI our mysql server is not offloaded.

Code: Select all

mysql> SELECT @@global.time_zone, @@session.time_zone;
+--------------------+---------------------+
| @@global.time_zone | @@session.time_zone |
+--------------------+---------------------+
| SYSTEM             | SYSTEM              |
+--------------------+---------------------+
1 row in set (0.00 sec)

# hwclock
Tue 18 Jun 2013 08:19:12 AM EST  -0.015970 seconds

# date
Tue Jun 18 08:19:15 EST 2013

Re: Monitoring Engine Event Queue anomaly?

Posted: Tue Jun 18, 2013 1:00 pm
by abrist
I do not believe that date is correct for the eastern time zone. You posted at 4:20pm CDT but the time reported by the server is 08:19:15 EST. Do use ntp? if not, I suggest you do so if it is in compliance with you internal security policies. IF it is not, you should set the date/time to the correct time.

Re: Monitoring Engine Event Queue anomaly?

Posted: Tue Jun 18, 2013 9:07 pm
by Box293
I think you might be right about the ntp stuff.

This Nagios XI VM has been in production for a very long time (going back to 2009 version). Back then the steps for configuring the timezone were not exactly clear.

Since then I created a SOP for configuing the timezone on a Nagios XI VM and published it. Funnily enough it appears that I did not go back over our production server and make sure it was implemented. The SOP is published here.

I went and compared the steps in my SOP to how our Nagios XI production server was configured and it did not appear to be accurate. So I have gone through my SOP and made sure the production server matches this.

It would be good if you could confirm if the timezone steps in this SOP are the steps the Nagios Enterprises team would use to configure the timezone and ntp configurations. If they are not what you would do, then what needs to be changed for them to be "correct".

I'll report back in a couple of hours to let you know if the problem has been resolved since making these changes and rebooting our Nagios XI VM.

Re: Monitoring Engine Event Queue anomaly?

Posted: Wed Jun 19, 2013 1:43 am
by Box293
Great news, this appears to have resolved the problem.

While watching the VM's CPU usage over a few hours I notice two things:
  • There is a high CPU spike every hour that lasts about one minute but has not noticable impact on Nagios XI

    There is also another CPU process that occurs every hour that lasts about five minutes. This is not a high CPU spike, it's more like an additional 10-20% CPU usage that runs for approximately five minutes. While this is happening the same behaviour occurs when watching Scheduled Events Over Time, but it only lasts for five minutes and the number of events build up to about 2500 - 3000. Another behaviour I notice is that this is a time window that shifts along five minutes every hour. Whatever background process is happening appears to schedule the next occurrance of itself after it completes (to run again in one hours time)
I would say that this problem is now resolved. I would like to see if what I have described above is normal behaviour.

Also here is a screenshot of the past days CPU usage of this Nagios XI VM, it's pretty clear when I made the ntp changes and rebooted the VM.
1 Day CPU Summary After NTP changes.png
Thanks very much for helping get this resolved.

Re: Monitoring Engine Event Queue anomaly?

Posted: Wed Jun 19, 2013 10:21 am
by abrist
Box293 wrote:I would say that this problem is now resolved. I would like to see if what I have described above is normal behaviour.
I would label this as normal behavior. The spike is most likely attributed to one of a few different crons run hourly for XI.

I am very glad that this issue has been resolved. I will close the thread, if you need add any more information to this thread, pm me and I will reopen it. Otherwise, enjoy the rest of the week and happy monitoring!