Page 1 of 1

Service notification - why?

PostPosted: Wed Aug 10, 2022 7:01 am
by invade
Hi

Is anyone able to explain why the following service notification was triggered:

Code: Select all
2022-08-10 00:00:00+01:00 CURRENT HOST STATE: host.example.com;UP;HARD;1;OK: 10-08-2022 @ 08:55:13 AEST
2022-08-10 00:00:00+01:00 CURRENT SERVICE STATE: host.example.com;Samba;OK;HARD;1;OK: smb.service unit is active - 10-08-2022 @ 08:55:55 AEST
2022-08-10 12:07:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:10:50+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:12:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;2;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:14:05+01:00 HOST ALERT: host.example.com;UP;SOFT;1;OK: 10-08-2022 @ 21:14:05 AEST
2022-08-10 12:16:05+01:00 SERVICE NOTIFICATION: support;host.example.com;Samba;UNKNOWN;service_notification;UNKNOWN - Plugin timed out
2022-08-10 12:20:54+01:00 SERVICE ALERT: host.example.com;Samba;OK;SOFT;1;OK: smb.service unit is active - 10-08-2022 @ 21:20:54 AEST


We use the check_by_ssh plugin to perform active checks on a number of hosts & services.

Nagios host is running 4.4.6 on Rocky Linux 8.

For both types of check we use the following settings:

Code: Select all
max_check_attempts   13
retry_interval      5


In this case there was a ~7 minute period where the host was unavailable (which is usually a network problem).

There is no alert log entry for the first service check retry but, it looks like the notification was triggered after the first check retry failed, even though the max_check_attempts is set to 13 and the host check was OK at this point.

If you need any more information, just let me know.

Thanks in advance.

Re: Service notification - why?

PostPosted: Mon Aug 15, 2022 3:44 am
by invade
Just to add that I have now enabled the “host_down_disable_service_checks” options, but I can still see service checks being run (and generating an alert) when the host is down. eg.

Code: Select all
2022-08-15 00:56:29+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;UNKNOWN - Plugin timed out
2022-08-15 00:56:38+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;UNKNOWN - Plugin timed out