Monitoring Engine Process 15 minute delayed start?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
ockmeyer
Posts: 74
Joined: Mon Jun 25, 2012 2:17 pm

Re: Monitoring Engine Process 15 minute delayed start?

Post by ockmeyer »

I haven't waited the full 15 minutes, but after a few minutes it still shows the Monitoring Engine Process "not running".
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Monitoring Engine Process 15 minute delayed start?

Post by mguthrie »

Would you be interested in a remote session to take a look at this, maybe tomorrow? If so let me know and we'll discuss the details in a PM.
ockmeyer
Posts: 74
Joined: Mon Jun 25, 2012 2:17 pm

Re: Monitoring Engine Process 15 minute delayed start?

Post by ockmeyer »

sure
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Monitoring Engine Process 15 minute delayed start?

Post by mguthrie »

Details sent over PM.
ockmeyer
Posts: 74
Joined: Mon Jun 25, 2012 2:17 pm

Re: Monitoring Engine Process 15 minute delayed start?

Post by ockmeyer »

Mike worked with me on this and we concluded that the retention.dat file was too large. Deleting it and letting it regenerate seems to have fixed the problem.

I verified it with a second server having an identical problem.

Thanks for all the help, Mike!
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Monitoring Engine Process 15 minute delayed start?

Post by mguthrie »

Thanks for updating on this with the solution as well. If you ever see this issue come up again, can we have you send us the oversized retention.dat file so we can try and better trace why that file got so large?
ockmeyer
Posts: 74
Joined: Mon Jun 25, 2012 2:17 pm

Re: Monitoring Engine Process 15 minute delayed start?

Post by ockmeyer »

Will do.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Monitoring Engine Process 15 minute delayed start?

Post by mguthrie »

After looking through the file, I'm noticing that there's almost 90000 host comments all related to:
"This host has been scheduled for fixed downtime..."

Anything goofy showing up in your downtime or recurring downtime pages?
ockmeyer
Posts: 74
Joined: Mon Jun 25, 2012 2:17 pm

Re: Monitoring Engine Process 15 minute delayed start?

Post by ockmeyer »

I created three recurring downtime schedules the Friday before all of this started. Each one corresponds to the three maintenance windows we have, and they are based on hostgroups. People were complaining about getting alerts when they were doing maintenance and I thought this would eliminate that, but it appears to have created a bigger problem.

With hundreds of hosts in each hostgroup, is there a better way of configuring this without the side effect of such a large file?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Monitoring Engine Process 15 minute delayed start?

Post by mguthrie »

Unless you've got 30k hosts+services, then there's something goofy going on with that scheduler. I'm wondering if LOTS of duplicate downtime schedules are being created somehow. Even if you've got 10k checks total, there shouldn't be that many comments in the file.

You did set this up correctly, we'll do some digging on the possibility of duplicate schedules getting created and see what we can find out...
Locked