Host check intervals not honoured since 4.0.0

fredericve · Post by **fredericve** » Wed Apr 13, 2016 4:22 am

Hi,

I have a host template defined like this:

define host {
        name                            generic-host
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period             24x7
        register                        0
        notification_interval           0
        notification_options            d,u,r
        check_period                    24x7
        check_interval                  5               ; Actively check the host every 5 minutes
        retry_interval                  5               ; Schedule host check retries at 5 minute intervals
        max_check_attempts              12               ; Check each host 12 times before going to HARD state
        passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
}

And then some hosts using this template. The hosts also have a few simple services defined.

In this case, whenever a host goes down, I would expect the host down HARD state to happen after ~5*12 minutes = ~1 hour. However, since nagios 4.0.0 (tested 4.0.7 and 4.1.1) it appears that when the host is unreachable, at each service check the host check is executed again, causing the host HARD state to be reached fairly quickly.

I can confirm that this worked fine on nagios 3.5.1. Is this a regression since 4.0.0? Or can I restore the behaviour with a configuration option?

tmcdonald · Post by **tmcdonald** » Wed Apr 13, 2016 11:21 am

You are likely running into on-demand checks:

https://assets.nagios.com/downloads/nag ... hecks.html

On-demand checks are performed whenever Nagios sees a need to obtain the latest status information about a particular host or service. For example, when Nagios is determining the reachability of a host, it will often perform on-demand checks of parent and child hosts to accurately determine the status of a particular network segment. On-demand checks also occur in the predictive dependency check logic in order to ensure Nagios has the most accurate status information.

However this behavior was also present in the 3.X versions, so not sure why it would have changed. Does this sound like the behavior you are seeing?

fredericve · Post by **fredericve** » Wed Apr 13, 2016 4:47 pm

Hi,

Thanks for your response!

I had indeed read about the on-demand checks. The on-demand checks are indeed also happening on 3.5.1, but they don't seem to increment the counter that is counting towards the HARD state. I can confirm that for certain tomorrow.

It does sound like a bug to me. This behaviour basically means that the normal interval and retry interval timers are meaningless as soon as you assign a service to a host. Because any check of a service will force an increment in the amount of checks, regardless of the next actual scheduled check of the host.

Or am I wrong about this?

Cheers,

Frederic

Post by **Box293** » Thu Apr 14, 2016 2:29 am

I reported this as a bug back in February 2015 here:

https://github.com/NagiosEnterprises/na ... /issues/23

Yeah sorry I looked at your forum post just as I was finishing up for the day, clearly I didn't read it properly

fredericve · Post by **fredericve** » Thu Apr 14, 2016 4:07 am

Hello @Box293,

I don't think this is the same issue. Your issue is about the host state not going into HARD immediately when max_check_attempts is set to 1. The issue I'm having is about the attempts counter increasing when the on-demand checks are executed when a service state changes.

Best,

Frederic

tmcdonald · Post by **tmcdonald** » Thu Apr 14, 2016 9:44 am

For reference, I am linking the GitHub post here:

https://github.com/NagiosEnterprises/na ... issues/121

There's not a ton I can do from a support perspective to change this behavior, it's pretty far into the dev realm to decide if this is a bug (most likely they will) and then to fix it.

fredericve · Post by **fredericve** » Fri Apr 15, 2016 3:40 am

@tmcdonald

I created a pull request on GitHub that fixes the problem for me. Thanks for your support!

tmcdonald · Post by **tmcdonald** » Fri Apr 15, 2016 9:20 am

Great to hear it! I didn't have time yesterday to look too deep into the code but it looks like it was a pretty minor change. It's still up to the devs to decide if this is going to make it it, but they can always make it a configurable option to compromise.

I'll be closing this thread now, but feel free to open another if you need anything in the future!

Nagios Support Forum

Host check intervals not honoured since 4.0.0

Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0

Re: Host check intervals not honoured since 4.0.0