service checks stop working for no apparent reason
Posted: Tue Aug 10, 2021 4:47 pm
I am having an issue in which all service checks stop working at the same time and for no apparent reason. When this happens no errors are reported on the xi -> admin status page, no activity appears in /usr/local/nagios/var/nagios.log and values in the Last Check column in xi -> Service Status page are not updated.
When the issue occurs, the only item(s) printed to /usr/local/nagios/var/nagios.log is activity related to external commands ... "[1628629148] SERVICE DOWNTIME ALERT: 00000-0 -- as-tst-001.example.com;sshd daemon;CANCELLED; Scheduled downtime for service has been cancelled."
We use external commands to schedule/unschedule service downtimes.
The condition continues for a random period of time and appears to self recover. When it recovers, the following is printed to the
/usr/local/nagios/var/nagios.log
[1628629199] Warning: A system time change of 2001 seconds (0d 0h 33m 21s forwards in time) has been detected. Compensating...
In addition, the following is printed:
[1628629243] NDO-3: The following query failed while MySQL appears to be connected:
[1628629243] NDO-3: INSERT INTO nagios_downtimehistory (instance_id, downtime_type, object_id, entry_time, author_name, comment_data, internal_downtime_id, triggered_by_id, is_fixed, duration, scheduled_start_time, scheduled_end_t\
ime) VALUES (1,1,28595,FROM_UNIXTIME(1628628063),'joe','00893254',878153,0,1,93600,FROM_UNIXTIME(1628627925),FROM_UNIXTIME(1628721525)) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), downtime_type = VALUES(downtime_type), object_id = VALUES(object_id), entry_time = VALUES(entry_time), author_name = VALUES(author_name), comment_data = VALUES(comment_data), internal_downtime_id = VALUES(internal_downtime_id), triggered_by_id = VALUES(triggered_by_id), is_fixed = VALUES(is_fixed), duration = VALUES(duration), scheduled_start_time = VALUES(scheduled_start_time), scheduled_end_time = VALUES(scheduled_end_time)
This system was built (manual install of nagiosxi 5.7.5, which completed without error) on top of a fresh install of Oracle 8. I then restored a nagiosxi backup taken on a device running CentOS6 and running the same version, nagiosxi version 5.7.5. The restore went without error and I have a "functioning" nagiosxi running on oracle 8.
I suspect the issue is related to the time drift, but, not sure. Could be related to external commands.
I need to understand why the service checks stop working at random times, and correct the issue.
Thanks in advance for your help.
-wr
When the issue occurs, the only item(s) printed to /usr/local/nagios/var/nagios.log is activity related to external commands ... "[1628629148] SERVICE DOWNTIME ALERT: 00000-0 -- as-tst-001.example.com;sshd daemon;CANCELLED; Scheduled downtime for service has been cancelled."
We use external commands to schedule/unschedule service downtimes.
The condition continues for a random period of time and appears to self recover. When it recovers, the following is printed to the
/usr/local/nagios/var/nagios.log
[1628629199] Warning: A system time change of 2001 seconds (0d 0h 33m 21s forwards in time) has been detected. Compensating...
In addition, the following is printed:
[1628629243] NDO-3: The following query failed while MySQL appears to be connected:
[1628629243] NDO-3: INSERT INTO nagios_downtimehistory (instance_id, downtime_type, object_id, entry_time, author_name, comment_data, internal_downtime_id, triggered_by_id, is_fixed, duration, scheduled_start_time, scheduled_end_t\
ime) VALUES (1,1,28595,FROM_UNIXTIME(1628628063),'joe','00893254',878153,0,1,93600,FROM_UNIXTIME(1628627925),FROM_UNIXTIME(1628721525)) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), downtime_type = VALUES(downtime_type), object_id = VALUES(object_id), entry_time = VALUES(entry_time), author_name = VALUES(author_name), comment_data = VALUES(comment_data), internal_downtime_id = VALUES(internal_downtime_id), triggered_by_id = VALUES(triggered_by_id), is_fixed = VALUES(is_fixed), duration = VALUES(duration), scheduled_start_time = VALUES(scheduled_start_time), scheduled_end_time = VALUES(scheduled_end_time)
This system was built (manual install of nagiosxi 5.7.5, which completed without error) on top of a fresh install of Oracle 8. I then restored a nagiosxi backup taken on a device running CentOS6 and running the same version, nagiosxi version 5.7.5. The restore went without error and I have a "functioning" nagiosxi running on oracle 8.
I suspect the issue is related to the time drift, but, not sure. Could be related to external commands.
I need to understand why the service checks stop working at random times, and correct the issue.
Thanks in advance for your help.
-wr