Monitoring Engine Event Queue anomaly?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Engine Event Queue anomaly?

Post by lmiltchev »

We are still trying to pinpoint the issue. There is also another customer experiencing a similar problem. As soon as we have a possible solution, we will let you know.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Monitoring Engine Event Queue anomaly?

Post by Box293 »

No problems, let me know if there is any other information you require or if you want to establish a remote session to have a look at our server.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Engine Event Queue anomaly?

Post by lmiltchev »

Run the following commands and show us the output:

Code: Select all

tail -n 200 /var/log/messages
tail -n 200 /var/log/mysqld.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Monitoring Engine Event Queue anomaly?

Post by scottwilkerson »

Troy,

Searching through the forum, you had a very similar post about 6 months ago that was related to time syncing
http://support.nagios.com/forum/viewtop ... =10#p40970

Could this problem be creeping back up?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Monitoring Engine Event Queue anomaly?

Post by Box293 »

I've just run the two commands and have sent you a private message with the output from these commands (there is just some client related data that I would prefer not to post publically).

In relation to the time syncing stuff, I have checked and the VM does NOT have the VMware Tools syncing time with the ESXi host, I have CentOS configured to use NTP. When I ran the date command the correct date and time was displayed.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Monitoring Engine Event Queue anomaly?

Post by slansing »

Hmm, we are hoping to get into a remote session with another client experiencing this issue, today, we shall let you know what we dig up!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Monitoring Engine Event Queue anomaly?

Post by lmiltchev »

Troy, do you remember if your issues started around 04/04/2013?

Run the following command and send me the output via PM:

Code: Select all

tail -n 200 /usr/local/nagiosxi/var/dbmaint.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Monitoring Engine Event Queue anomaly?

Post by slansing »

Scott resolved the issue with the other client, the problem was an offloaded mysql database that was not synced with the XI server's time "off by a minute give or take," and a huge amount of backed up checkresults in:

Code: Select all

/usr/local/nagios/var/spool/checkresults/
Please let us know if these are the case for you, that check results pile up created the giant stack in the event queue you are experiencing.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Monitoring Engine Event Queue anomaly?

Post by Box293 »

Two very good questions.

When I look at what was happening around 2013/04/04 I found the following. This was a week after we had relocated our environment to a new datacenter. We had a problem with one of the iSCSI switches in a stack of two which rebooted, so during this time there was a hang of all VM's of about 10-30 seconds while the SAN controllers transitioned to the other switch. The Nagios XI VM was running on one of these SANs that was connected to the iSCSI switches. Not sure if it was related or not but before the relocation I disabled a lot of services and hosts that would no longer exist in the new datacenter (in CCM). Configuration applied OK, but these old services and hosts were left in the database for about four weeks afterwards.

As per the checkresults folder, there was about 1400 files in this folder when I had a look. Interestingly there were 1070 files that were created in 2012, 2011 and 2010. When watching the folder, files created in 2013 were being processed correctly and dissapearing soon thereafter. So I've just deleted those 1070 files and I'm waiting for the database maintenance task to complete.

I'll PM the results of the commands you requested and I'll get back to you in a couple of hours to see if the problem has been resolved.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Monitoring Engine Event Queue anomaly?

Post by Box293 »

Deleting those files has not made a difference.

The database maintenance job has run twice since and nothing seems to have changed, scheduled events over time went up to 4000+ as the job ran over a 20 minute window.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked