Nagios Support Forum

Posted: **Tue Aug 10, 2021 4:47 pm**

I am having an issue in which all service checks stop working at the same time and for no apparent reason. When this happens no errors are reported on the xi -> admin status page, no activity appears in /usr/local/nagios/var/nagios.log and values in the Last Check column in xi -> Service Status page are not updated.

When the issue occurs, the only item(s) printed to /usr/local/nagios/var/nagios.log is activity related to external commands ... "[1628629148] SERVICE DOWNTIME ALERT: 00000-0 -- as-tst-001.example.com;sshd daemon;CANCELLED; Scheduled downtime for service has been cancelled."
We use external commands to schedule/unschedule service downtimes.

The condition continues for a random period of time and appears to self recover. When it recovers, the following is printed to the
/usr/local/nagios/var/nagios.log

[1628629199] Warning: A system time change of 2001 seconds (0d 0h 33m 21s forwards in time) has been detected. Compensating...

In addition, the following is printed:

[1628629243] NDO-3: The following query failed while MySQL appears to be connected:
[1628629243] NDO-3: INSERT INTO nagios_downtimehistory (instance_id, downtime_type, object_id, entry_time, author_name, comment_data, internal_downtime_id, triggered_by_id, is_fixed, duration, scheduled_start_time, scheduled_end_t\
ime) VALUES (1,1,28595,FROM_UNIXTIME(1628628063),'joe','00893254',878153,0,1,93600,FROM_UNIXTIME(1628627925),FROM_UNIXTIME(1628721525)) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), downtime_type = VALUES(downtime_type), object_id = VALUES(object_id), entry_time = VALUES(entry_time), author_name = VALUES(author_name), comment_data = VALUES(comment_data), internal_downtime_id = VALUES(internal_downtime_id), triggered_by_id = VALUES(triggered_by_id), is_fixed = VALUES(is_fixed), duration = VALUES(duration), scheduled_start_time = VALUES(scheduled_start_time), scheduled_end_time = VALUES(scheduled_end_time)

This system was built (manual install of nagiosxi 5.7.5, which completed without error) on top of a fresh install of Oracle 8. I then restored a nagiosxi backup taken on a device running CentOS6 and running the same version, nagiosxi version 5.7.5. The restore went without error and I have a "functioning" nagiosxi running on oracle 8.

I suspect the issue is related to the time drift, but, not sure. Could be related to external commands.

I need to understand why the service checks stop working at random times, and correct the issue.

Thanks in advance for your help.

-wr

Posted: **Wed Aug 11, 2021 9:22 am**

nagios events processing is again stalled.

This appears to be related to a table lock issue

Here's what I see:
mysql> show full processlist;

| 8 | ndoutils | localhost | nagios | Execute | 39 | updating | UPDATE nagios_commenthistory SET deletion_time = FROM_UNIXTIME(1628686898), deletion_time_usec = 792110 WHERE comment_time = FROM_UNIXTIME(1628684788) AND internal_comment_id = 1211543 |
| 9 | ndoutils | localhost | nagios | Execute | 39 | Waiting for table level lock | INSERT INTO nagios_commenthistory (instance_id, comment_type, entry_type, object_id, comment_time, internal_comment_id, author_name, comment_data, is_persistent, comment_source, expires, expiration_time, entry_time, entry_time_usec) VALUES (1,2,2,38965,FROM_UNIXTIME(1628691410),1211857,'joe','This service has been scheduled for fixed downtime from 08-12-2021 00:00:00 to 08-13-2021 06:00:00. Notifications for the service will not be sent out during that time period.',0,0,0,FROM_UNIXTIME(0),FROM_UNIXTIME(1628691410),66950) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), comment_type = VALUES(comment_type), entry_type = VALUES(entry_type), object_id = VALUES(object_id), comment_time = VALUES(comment_time), internal_comment_id = VALUES(internal_comment_id), author_name = VALUES(author_name), comment_data = VALUES(comment_data), is_persistent = VALUES(is_persistent), comment_source = VALUES(comment_source), expires = VALUES(expires), expiration_time = VALUES(expiration_time), entry_time = VALUES(entry_time), entry_time_usec = VALUES(entry_time_usec) |

Posted: **Wed Aug 11, 2021 10:51 am**

Hello @warapp

Thanks for reaching out and want to find out what is the time difference between the Nagios XI and the device that you are checking? Also, let's get the System Profile so we can see what things look like from that end.

To send us your system profile.

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send via Private Message

Thanks,
Perry

Posted: **Wed Aug 11, 2021 4:04 pm**

I've sent you the profile via pm.

This system is not yet stable.

I now see activity in /usr/local/nagios/var/nagios.log indicating that service checks are running, notifications sent, but, values in the Last Check column in xi -> Service Status page are not correct. They show times ~ 10 hours ago.

Posted: **Wed Aug 11, 2021 4:40 pm**

Hello @warapp

Thanks for following up, yeah looks like things are a bit wonky since migrating things over to your 8.4.

I want to have you go ahead and run the database repair and then bounce the nagios.service.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh

Please let us know how the services checks are cruising along and test to see if you are able to add host via API per the other support forum posted as well.

Follow up with the updated System Profile and let us know what works and what does not.

Thanks,
Perry

Posted: **Thu Aug 12, 2021 8:01 am**

I've run the repair, restarted the service and sent the latest profile to you via pm.

I'm still seeing activity in /usr/local/nagios/var/nagios.log indicating that service checks are running, notifications sent, but, time values in the Last Check column in xi -> Service Status page are not current; they show times ~ 1 hour ago.

Posted: **Thu Aug 12, 2021 8:43 am**

The service checks are running, I see activity in /usr/local/nagios/var/nagios.log, but, nothing gets updated in xi, see attached.

This is a restored system. It is using postgresql and mysql.

Posted: **Thu Aug 12, 2021 1:30 pm**

Reviewing article here, https://support.nagios.com/kb/article/n ... ng-19.html

I see correct "Last Check Time" information in Core for the services. The issue is xi does not show correct times. I've reviewed the article, in particular step 2, and nothing has corrected this issue.

Posted: **Thu Aug 12, 2021 2:43 pm**

Hello @warapp

To follow up, I spoke to a couple of colleagues on the migration issues you have been having issues with since the move over to 8.4. We would like to go ahead and downgrade NDO3. Please take a full backup or VM snapshot before proceeding.

### STANDARD DOWNGRADE OF NDO3

Code: Select all

systemctl stop nagios
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
systemctl enable ndo2db

Then edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is uncommented:

Code: Select all

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg

Make sure this line is commented:

Code: Select all

#broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg

Then start the nagios service:

Code: Select all

systemctl start nagios

Please follow up with the results,
Perry

Posted: **Fri Aug 13, 2021 7:43 am**

Followed your instructions and downgraded NDO. This had no affect on my two issues: api host add not working and xi statuses not updating consistently.

A couple more observations that may help direct us to a cause/fix.

1) Last Check times in XI, when I first logged in today, were reporting correctly. Up-to-date and aligned to what I saw in Core.

2) I applied your NDO changes soon after and since then XI Last Check times have not changed; they show times around when I applied the NDO changes and restarted.

Nagios Support Forum

service checks stop working for no apparent reason

service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason

Re: service checks stop working for no apparent reason