Page 1 of 1

All checks running at same time interval

Posted: Wed May 28, 2014 5:14 am
by WillemDH
Hello,

We have to shut down one of our datacenters Friday, the one where our Nagios server is running. So I asked a collegaue to move the Nagios server to another datacenter (VMWare). he did this yesterday evening and today I noticed that all the checks seem to run at the same time interval while yesterday this was not the case.
See the screenshot of the Monitoring Engine Event Queue.

As this does not seem an ideal situation, I already tried rebooting the server, but this doesn't seem to help.
In http://support.nagios.com/forum/viewtop ... ention.dat, I was aksed to delete the retention.dat file. is this the only solution to spread the service checks again more evenly?

I presume that the VMotion of the virtual Nagios XI server caused the checks to lag an then all start at the same time. is there any way to prevent this from happening, as next week i'll have to move the server back to the original datacenter.

Grtz

Willem

Re: All checks running at same time interval

Posted: Wed May 28, 2014 9:41 am
by abrist
This is usually a sign that the server got behind in the scheduling of checks. This is usually caused by performance issues. Is the load average higher than normal? How about the io wait?

Re: All checks running at same time interval

Posted: Wed May 28, 2014 10:11 am
by WillemDH
Andy,

As I said we had to move the server to another datacenter (VMotion). This has caused the latency, as we have to temporarily move the server to a les performant datastore. So I do know the reason, but the performance loss is only temporary. It should be ok now. The question is how can I solve it after a Vmotion to another datacenter.

Grtz

Re: All checks running at same time interval

Posted: Wed May 28, 2014 3:10 pm
by abrist
Remove retention.dat and restart nagios, or just wait it out while the server catches up.

Re: All checks running at same time interval

Posted: Wed Jun 11, 2014 5:58 am
by WillemDH
It seems the server managed to distribute the load evenly again. Thread can be closed.

Thanks!

Willem