Weird scheduling issues
Posted: Tue Apr 09, 2013 7:52 pm
Hi guys,
I'm still only beginning my investigation but thought I would open a ticket here in case you've seen this before. For the last three days we've been having issues where Nagios will never execute checks and just continually schedule hosts 10 minutes into the future (which is our usual check interval).
Today is a little different, instead of doing this for every host like the previous couple of days, today it is only doing it for some hosts with little to nothing in-common (different templates, host gorups, etc). Restarting Nagios seems to resolve the problem for somewhere between 12/24 hours before the problem starts again. We are currently running XI r1.6, we've attempted a restart of the Nagios server, I've confirmed that this is occurring at the Nagios Core level and not just some database oddity.
I'm hoping some one has seen something similar to this before and can save me a little time.
Thanks!
End of day edit:
Well I learned nothing of value, databases are a-ok, I've update to XI 1.7, upgraded the vmware tools that were out of date and discovered 8000 files in /tmp/ called checkXXXXXX which I've removed (what's the deal with those?) and couldn't find anything else out of the ordinary. I've also done the pre-requisite amount of finger crossing so let me know if there's something else I should check.
I'm still only beginning my investigation but thought I would open a ticket here in case you've seen this before. For the last three days we've been having issues where Nagios will never execute checks and just continually schedule hosts 10 minutes into the future (which is our usual check interval).
Today is a little different, instead of doing this for every host like the previous couple of days, today it is only doing it for some hosts with little to nothing in-common (different templates, host gorups, etc). Restarting Nagios seems to resolve the problem for somewhere between 12/24 hours before the problem starts again. We are currently running XI r1.6, we've attempted a restart of the Nagios server, I've confirmed that this is occurring at the Nagios Core level and not just some database oddity.
I'm hoping some one has seen something similar to this before and can save me a little time.
Thanks!
End of day edit:
Well I learned nothing of value, databases are a-ok, I've update to XI 1.7, upgraded the vmware tools that were out of date and discovered 8000 files in /tmp/ called checkXXXXXX which I've removed (what's the deal with those?) and couldn't find anything else out of the ordinary. I've also done the pre-requisite amount of finger crossing so let me know if there's something else I should check.