Service notification - why?

invade · Post by **invade** » Wed Aug 10, 2022 7:01 am

Hi

Is anyone able to explain why the following service notification was triggered:

2022-08-10 00:00:00+01:00 CURRENT HOST STATE: host.example.com;UP;HARD;1;OK: 10-08-2022 @ 08:55:13 AEST
2022-08-10 00:00:00+01:00 CURRENT SERVICE STATE: host.example.com;Samba;OK;HARD;1;OK: smb.service unit is active - 10-08-2022 @ 08:55:55 AEST
2022-08-10 12:07:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:10:50+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:12:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;2;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:14:05+01:00 HOST ALERT: host.example.com;UP;SOFT;1;OK: 10-08-2022 @ 21:14:05 AEST
2022-08-10 12:16:05+01:00 SERVICE NOTIFICATION: support;host.example.com;Samba;UNKNOWN;service_notification;UNKNOWN - Plugin timed out
2022-08-10 12:20:54+01:00 SERVICE ALERT: host.example.com;Samba;OK;SOFT;1;OK: smb.service unit is active - 10-08-2022 @ 21:20:54 AEST

We use the check_by_ssh plugin to perform active checks on a number of hosts & services.

Nagios host is running 4.4.6 on Rocky Linux 8.

For both types of check we use the following settings:

Code: Select all

max_check_attempts	13
retry_interval		5

In this case there was a ~7 minute period where the host was unavailable (which is usually a network problem).

There is no alert log entry for the first service check retry but, it looks like the notification was triggered after the first check retry failed, even though the max_check_attempts is set to 13 and the host check was OK at this point.

If you need any more information, just let me know.

Thanks in advance.

invade · Post by **invade** » Mon Aug 15, 2022 3:44 am

Just to add that I have now enabled the “host_down_disable_service_checks” options, but I can still see service checks being run (and generating an alert) when the host is down. eg.

Code: Select all

2022-08-15 00:56:29+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;UNKNOWN - Plugin timed out
2022-08-15 00:56:38+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;UNKNOWN - Plugin timed out

Nagios Support Forum

Service notification - why?

Service notification - why?

Re: Service notification - why?