z

Commercial Support Clients: Clients with support contracts can get escalated support assistance by visiting Nagios Answer Hub. These forums are for community support services. Although we at Nagios try our best to help out on the forums here, we always give priority support to our support clients.

Service notification - why?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.

Service notification - why?

Postby invade » Wed Aug 10, 2022 7:01 am

Hi

Is anyone able to explain why the following service notification was triggered:

Code: Select all
2022-08-10 00:00:00+01:00 CURRENT HOST STATE: host.example.com;UP;HARD;1;OK: 10-08-2022 @ 08:55:13 AEST
2022-08-10 00:00:00+01:00 CURRENT SERVICE STATE: host.example.com;Samba;OK;HARD;1;OK: smb.service unit is active - 10-08-2022 @ 08:55:55 AEST
2022-08-10 12:07:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:10:50+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:12:11+01:00 HOST ALERT: host.example.com;DOWN;SOFT;2;Remote command execution failed: ssh: connect to host host.example.com port 22: Connection refused
2022-08-10 12:14:05+01:00 HOST ALERT: host.example.com;UP;SOFT;1;OK: 10-08-2022 @ 21:14:05 AEST
2022-08-10 12:16:05+01:00 SERVICE NOTIFICATION: support;host.example.com;Samba;UNKNOWN;service_notification;UNKNOWN - Plugin timed out
2022-08-10 12:20:54+01:00 SERVICE ALERT: host.example.com;Samba;OK;SOFT;1;OK: smb.service unit is active - 10-08-2022 @ 21:20:54 AEST


We use the check_by_ssh plugin to perform active checks on a number of hosts & services.

Nagios host is running 4.4.6 on Rocky Linux 8.

For both types of check we use the following settings:

Code: Select all
max_check_attempts   13
retry_interval      5


In this case there was a ~7 minute period where the host was unavailable (which is usually a network problem).

There is no alert log entry for the first service check retry but, it looks like the notification was triggered after the first check retry failed, even though the max_check_attempts is set to 13 and the host check was OK at this point.

If you need any more information, just let me know.

Thanks in advance.
invade
 
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service notification - why?

Postby invade » Mon Aug 15, 2022 3:44 am

Just to add that I have now enabled the “host_down_disable_service_checks” options, but I can still see service checks being run (and generating an alert) when the host is down. eg.

Code: Select all
2022-08-15 00:56:29+01:00 HOST ALERT: host.example.com;DOWN;SOFT;1;UNKNOWN - Plugin timed out
2022-08-15 00:56:38+01:00 SERVICE ALERT: host.example.com;Samba;UNKNOWN;HARD;1;UNKNOWN - Plugin timed out
invade
 
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am


Return to Open Source Nagios Projects

Who is online

Users browsing this forum: No registered users and 18 guests