Scheduling queue freezes

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Scheduling queue freezes

Post by aisebouma »

We are running Nagios Core 4.1.1 On Ubuntu Server 14.04 running on Vmware. About once a month the scheduling queue freezes and even restarting nagios does not resolve it, I also have to delete retention.dat.

The log suddenly shows:
[1449015630] Warning: A system time change of 1507 seconds (0d 0h 25m 7s forwards in time) has been detected. Compensating...
[1449040674] Warning: A system time change of 25044 seconds (0d 6h 57m 24s forwards in time) has been detected. Compensating...
[1449065417] Warning: A system time change of 24743 seconds (0d 6h 52m 23s forwards in time) has been detected. Compensating...
[1449087641] Warning: A system time change of 22224 seconds (0d 6h 10m 24s forwards in time) has been detected. Compensating...
[1449112824] Warning: A system time change of 25183 seconds (0d 6h 59m 43s forwards in time) has been detected. Compensating...
[1449134987] Warning: A system time change of 22163 seconds (0d 6h 9m 23s forwards in time) has been detected. Compensating...
...

The server uses NTP to keep the clock up to date and also the hardware clock shows no large deviation.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Scheduling queue freezes

Post by rkennedy »

What resources do you have allocated to this virtual machine? How many hosts / service checks are running?
Former Nagios Employee
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

rkennedy wrote:What resources do you have allocated to this virtual machine? How many hosts / service checks are running?
3GB memory, more then enough diskspace and 1 processor. The average CPU load is 10%.

It checks 66 hosts and 873 services
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Scheduling queue freezes

Post by hsmith »

Which logs are you checking for information when this happens?
Former Nagios Employee.
me.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Scheduling queue freezes

Post by jolson »

Are you running pnp4nagios on this server?

Code: Select all

ps -ef | grep npcd
If not, these time deviations are abnormal.

Nagios just detects the system time change, but has no control over actually changing it. This is _almost certainly_ NTP changing the time of your system for one reason or another.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

hsmith wrote:Which logs are you checking for information when this happens?
/usr/local/nagios/var/nagios.log
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

jolson wrote:Are you running pnp4nagios on this server?

Code: Select all

ps -ef | grep npcd
If not, these time deviations are abnormal.

Nagios just detects the system time change, but has no control over actually changing it. This is _almost certainly_ NTP changing the time of your system for one reason or another.
No I am not running pnp4nagios.

First of all a 1500 seconds time deviation is not normal for ntp.

Second, why would nagios stop processing the scheduled queue after a time change?

Third, why is it not robust enough to restart processing the queue when nagios is restarted?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Scheduling queue freezes

Post by rkennedy »

Can you post the result of the following command for us to look at? ntpstat
Former Nagios Employee
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

rkennedy wrote:Can you post the result of the following command for us to look at? ntpstat
Sure:

root@tibet:~# ntpstat
synchronised to NTP server (10.116.11.1) at stratum 4
time correct to within 388 ms
polling server every 1024 s
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Scheduling queue freezes

Post by tmcdonald »

aisebouma wrote:First of all a 1500 seconds time deviation is not normal for ntp.
We agree, @jolson mentioned this above.
aisebouma wrote:Second, why would nagios stop processing the scheduled queue after a time change?
Nagios uses in-memory timestamps to determine when the next checks should be run, notifications be sent, etc. When this skews from the system clock, you can see the "Compensating" message above. All these timestamps are periodically written to disk in retention.dat which is used to store state between reboots. This is why deleting that file causes the queue to refresh.
aisebouma wrote:Third, why is it not robust enough to restart processing the queue when nagios is restarted?
As mentioned above, the retention.dat file is what causes state to be restored on a reboot, and if the times saved in that file are off then the time will be off when nagios restarts. This is a drawback to be sure, but the alternative is that nothing gets saved on a restart and everything essentially gets re-checked from square one. This can be configured on or off, but the benefits far outweigh the drawbacks. At any rate, a little skew over time is usually dealt with in stride, but 6 hours suddenly is harder to deal with consistently.

We did see another user recently have a similar issue, but his was every weekend and the clock skew was not nearly as consistent (yours seems to be at least in the 6h range). We still haven't found what caused it on his system, and they did a lot of work up-front on their end to rule things out. A couple things I would like to ask of you:

Are you running mod_gearman?
How precise is the "once a month" estimate? Is it on a particular calendar day? Week day?
Does anything in your environment/network occur around the time that this happens?
Former Nagios employee
Locked