Page 1 of 1

Warning in logs about a "system time change"

Posted: Thu Jan 09, 2020 12:48 pm
by optionstechnology
Hi,

Nagios stopped working for us last night for around 7 hours (i.e. checks not being executed etc.) and the only thing I can see in the logs before it stopped is below:
[1578549619] Warning: A system time change of 3840 seconds (0d 1h 4m 0s forwards in time) has been detected. Compensating...

We monitor each Nagios instance from a separate instance located in a different DC. We have checks for Nagios XI Daemons, Nagios services etc yet we didn't get any alerts. Any idea why we didn't get any alert, why this happened and how we can prevent it in future?

The CPU usage was relatively high (CPU is usually quite high on this instance) before the crash but I've never seen this happen before, especially without notifying us that monitoring engine has stopped etc.

Thanks

Re: Warning in logs about a "system time change"

Posted: Thu Jan 09, 2020 4:45 pm
by mbellerue
So the system time leaped forward an hour, and the Nagios processes crashed? This may be a hard one to troubleshoot after the fact. Can you send in a system profile of the Nagios instance that was monitoring the crashed Nagios instance? I can probably spot why the working instance didn't alert. It would also be good to get a profile from the crashed Nagios instance as well.

As far as the crashed instance, what was the remediation process? Did you just restart the whole Nagios server, or did you just start the Nagios service? If it was just starting the Nagios service, did you make note of their state prior to starting them? I'm wondering if the Nagios service was just in some kind of hung state where it looked like it was still operational, but it actually wasn't.