Monitoring Engine Event Queue anomaly?
Re: Monitoring Engine Event Queue anomaly?
We are still trying to pinpoint the issue. There is also another customer experiencing a similar problem. As soon as we have a possible solution, we will let you know.
Be sure to check out our Knowledgebase for helpful articles and solutions!
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring Engine Event Queue anomaly?
No problems, let me know if there is any other information you require or if you want to establish a remote session to have a look at our server.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Monitoring Engine Event Queue anomaly?
Run the following commands and show us the output:
Code: Select all
tail -n 200 /var/log/messages
tail -n 200 /var/log/mysqld.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring Engine Event Queue anomaly?
Troy,
Searching through the forum, you had a very similar post about 6 months ago that was related to time syncing
http://support.nagios.com/forum/viewtop ... =10#p40970
Could this problem be creeping back up?
Searching through the forum, you had a very similar post about 6 months ago that was related to time syncing
http://support.nagios.com/forum/viewtop ... =10#p40970
Could this problem be creeping back up?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring Engine Event Queue anomaly?
I've just run the two commands and have sent you a private message with the output from these commands (there is just some client related data that I would prefer not to post publically).
In relation to the time syncing stuff, I have checked and the VM does NOT have the VMware Tools syncing time with the ESXi host, I have CentOS configured to use NTP. When I ran the date command the correct date and time was displayed.
In relation to the time syncing stuff, I have checked and the VM does NOT have the VMware Tools syncing time with the ESXi host, I have CentOS configured to use NTP. When I ran the date command the correct date and time was displayed.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Monitoring Engine Event Queue anomaly?
Hmm, we are hoping to get into a remote session with another client experiencing this issue, today, we shall let you know what we dig up!
Re: Monitoring Engine Event Queue anomaly?
Troy, do you remember if your issues started around 04/04/2013?
Run the following command and send me the output via PM:
Run the following command and send me the output via PM:
Code: Select all
tail -n 200 /usr/local/nagiosxi/var/dbmaint.logBe sure to check out our Knowledgebase for helpful articles and solutions!
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Monitoring Engine Event Queue anomaly?
Scott resolved the issue with the other client, the problem was an offloaded mysql database that was not synced with the XI server's time "off by a minute give or take," and a huge amount of backed up checkresults in:
Please let us know if these are the case for you, that check results pile up created the giant stack in the event queue you are experiencing.
Code: Select all
/usr/local/nagios/var/spool/checkresults/- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring Engine Event Queue anomaly?
Two very good questions.
When I look at what was happening around 2013/04/04 I found the following. This was a week after we had relocated our environment to a new datacenter. We had a problem with one of the iSCSI switches in a stack of two which rebooted, so during this time there was a hang of all VM's of about 10-30 seconds while the SAN controllers transitioned to the other switch. The Nagios XI VM was running on one of these SANs that was connected to the iSCSI switches. Not sure if it was related or not but before the relocation I disabled a lot of services and hosts that would no longer exist in the new datacenter (in CCM). Configuration applied OK, but these old services and hosts were left in the database for about four weeks afterwards.
As per the checkresults folder, there was about 1400 files in this folder when I had a look. Interestingly there were 1070 files that were created in 2012, 2011 and 2010. When watching the folder, files created in 2013 were being processed correctly and dissapearing soon thereafter. So I've just deleted those 1070 files and I'm waiting for the database maintenance task to complete.
I'll PM the results of the commands you requested and I'll get back to you in a couple of hours to see if the problem has been resolved.
When I look at what was happening around 2013/04/04 I found the following. This was a week after we had relocated our environment to a new datacenter. We had a problem with one of the iSCSI switches in a stack of two which rebooted, so during this time there was a hang of all VM's of about 10-30 seconds while the SAN controllers transitioned to the other switch. The Nagios XI VM was running on one of these SANs that was connected to the iSCSI switches. Not sure if it was related or not but before the relocation I disabled a lot of services and hosts that would no longer exist in the new datacenter (in CCM). Configuration applied OK, but these old services and hosts were left in the database for about four weeks afterwards.
As per the checkresults folder, there was about 1400 files in this folder when I had a look. Interestingly there were 1070 files that were created in 2012, 2011 and 2010. When watching the folder, files created in 2013 were being processed correctly and dissapearing soon thereafter. So I've just deleted those 1070 files and I'm waiting for the database maintenance task to complete.
I'll PM the results of the commands you requested and I'll get back to you in a couple of hours to see if the problem has been resolved.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Monitoring Engine Event Queue anomaly?
Deleting those files has not made a difference.
The database maintenance job has run twice since and nothing seems to have changed, scheduled events over time went up to 4000+ as the job ran over a 20 minute window.
The database maintenance job has run twice since and nothing seems to have changed, scheduled events over time went up to 4000+ as the job ran over a 20 minute window.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.