Page 1 of 2

Nagios checking services after Host down

Posted: Sun Dec 07, 2014 10:31 pm
by Fred Kroeger
I "thought" that if Nagios detected a Host Down, that it would suspend checking the services for that host until it came up again?
The below screenshot from the Event Log shows that the Host Alert came in first at 08:08 followed then by each service over the next 12minutes.
Then one of the services reported that it recovered just before the Host Up was logged.
Events.png
The notification suppression for the Services worked OK as I only got a notification for the Host Down. However these events are also managed by a Global Event Handler, so ideally I don't want Nagios to send out Critical Service alerts when a host goes down. I would have expected them to go to UNKNOWN .
I'm running NagiosXI 2014R1.4.

Regards.... Fred

Re: Nagios checking services after Host down

Posted: Mon Dec 08, 2014 2:07 pm
by lmiltchev
I "thought" that if Nagios detected a Host Down, that it would suspend checking the services for that host until it came up again?
This is not correct. Imagine a scenario, where ping/icmp checks are disabled on a host by the firewall. The host would be up and you would still want to be checking services, regardless of the fact that nagios is showing the host as "down"...

Re: Nagios checking services after Host down

Posted: Mon Dec 08, 2014 8:37 pm
by Fred Kroeger
Not following the logic here?
If ping/icmp is disabled then I wouldn't be using that to monitor the host. Nagios would be reporting that it is down so the info on the Nagios screen would be false? My SLA reports would be wrong, the exceutive summary would be wrong, etc.
In this situation I would use check_dummy or negate or some other check that would at least show a true state.

Currently, when a Host goes down, I get a ticket created *plus* 20 other tickets for each service that goes CRITICAL.
If a Host is showing DOWN, then the Service Monitors should at least show UNKNOWN ? They are not CRITICAl as the critical threshold hasn't been reached.

Re: Nagios checking services after Host down

Posted: Mon Dec 08, 2014 9:19 pm
by Box293
There are a couple of things to talk about here. Some of this goes back to basics but it's easier to have an example to discuss scenarios with.

1) How long it takes for the host to go into a hard state and become "down" compared to how long it takes your services to go into a hard state and become "down".

Example:

Host
check_interval = 5
max_check_attempts = 3
retry_interval = 2

Service(s)
check_interval = 2
max_check_attempts = 3
retry_interval = 1

1:10pm - Host is checked and detected as UP, next check is 1.15pm
1.11pm - Host goes down, nagios does not know about it yet
1.12pm - Service check fails, retry interval is 1 so next attempt is 1.13pm (soft state)
1.13pm - Service check retry fails, retry interval is 1 so next attempt is 1.14pm (soft state)
1.14pm - Service check fails, max_check_attempts reached so alert is sent (hard state)
1.15pm - Host check fails, retry interval is 2 so next attempt is 1.17pm (soft state)
more service checks happening / retrying / alerting
1.17pm - Host check fails, retry interval is 2 so next attempt is 1.19pm (soft state)
more service checks happening / retrying / alerting
1.19pm - Host check fails, max_check_attempts reached so alert is sent (hard state)
No more service alerts will be sent until the host recovers

Basically, the point I am making here is that your service check interval / retry interval / max check attempts need to exceed what the host check interval / retry interval / max check attempts are. Once the host goes into a hard down state then service checks will continue to be checked however no notifications will be sent.

2) Host and Service Dependencies
Dependencies are a great way to stop checks from being scheduled / executed when something goes down. However if I remember correctly, you can't make services depend on a host object. To get around this you create a service that you depend on.

You can create a ping service for that host and then create a service dependency for all other services on that host which depend on that ping service. In the dependency you define what state of the dependent service will allow the depending services to be executed on their next schedule.

Once that ping service goes into a hard critical state, all other service checks that depend on it will not be executed and hence their state will remain as per the last time the check ran. Once that ping service goes into a hard OK state, all other services checks will be allowed to execute again.
lmiltchev wrote:Imagine a scenario, where ping/icmp checks are disabled on a host by the firewall. The host would be up and you would still want to be checking services, regardless of the fact that nagios is showing the host as "down"...
I completely get what your saying. From a different perspective, if it was only the ping/icmp packets being denied by the firewall and other service checks were still executing OK, it would help in your troubleshooting. It's really an open ended debate ... it all depends on your environment.

Re: Nagios checking services after Host down

Posted: Tue Dec 09, 2014 12:26 am
by Fred Kroeger
Hi Troy
All my tinmings are right - check the screenshot I appneded to my original post.
The Host Alert (Hard) preceeded the first Hard Service alert by 2 minutes .

Yes you can't have a service dependency depend on a host check. I found it amusing that you suggested that I use a ping service for the dependency when the whole rationale of the service checks not being disabled on a host down was because people still wanted to monitor services when icmp/ping is blocked by a firewall! ;-)

Nagios is already providing this type of functionality with "blocking outages" by the use of Parents. Why not use it at the Host level ?
If Nagios knows that the host is down, why would it bother checking all the 20plus services configured for that host. It reduces the amount of checking and notifications required.

regards... Fred

Re: Nagios checking services after Host down

Posted: Tue Dec 09, 2014 1:00 am
by zaji_nms
Dear Nagios Team

We also reported same issue....you have to work hard.

our POST : "Stop Service Checks When Host Down" ... zaji_nms ยป Mon Dec 30, 2013 9:45
http://support.nagios.com/forum/viewtop ... 16&t=24484

When Host Down, SERVICE Alet should not come, even no need to UNKNOWN (Nagios should give options to user to customize).

In our monitoring style/setup, the very first condition, HOST should be pingable/reachable, then will check next issue.

Regards


MOD EDIT - added link to your post

Re: Nagios checking services after Host down

Posted: Tue Dec 09, 2014 4:54 pm
by Box293
The best I can do here is ask that you post a feature request on the Nagios Tracker.

With this being a function of Core, not specific to XI, it should be created here: http://tracker.nagios.org

Taking end users feedback is part of what makes Open Source so good :)

Re: Nagios checking services after Host down

Posted: Thu Dec 11, 2014 10:34 pm
by Fred Kroeger
Issue 657 logged on tracker

Re: Nagios checking services after Host down

Posted: Fri Dec 12, 2014 11:05 am
by cmerchant
Fred, Thanks for posting that feature request on tracker. I'll go ahead and close this for now.

Re: Nagios checking services after Host down

Posted: Wed Aug 19, 2015 4:40 pm
by Box293
Fred,
The feature request has been implemented in Nagios Core 4.1.0 and is the host_down_disable_service_checks option now available in nagios.cfg.

When Core 4.1.0 is integrated in with Nagios XI you will be able to enable this functionality.