Nagios Support Forum

Posted: **Mon Jan 13, 2014 9:58 am**

You can try switching your max_host_check_spread to 30 minutes instead of 1 minute in the nagios.cfg and then restart nagios. This is what that value does:

This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. Default value is 30 (minutes).

Posted: **Thu Jan 16, 2014 7:55 pm**

Giving that a shot. Will post results soon. Thanks!

Posted: **Fri Jan 17, 2014 11:47 am**

Great! Keep us up to date.

Posted: **Fri Jan 17, 2014 3:13 pm**

No joy I'm afraid. The change to that setting had no effect whatsoever.

The expectation with settings below would be that after SOFT DOWN 1, Nagios should retry every 2 min. Instead it retried 8 times within a period of about 3 minutes.
Test host: Ash. Test Setings: check_interval 1, retry_interval 2, max_check_attempts 8

Host Down[2014-01-17 12:03:37] HOST ALERT: ash;DOWN;HARD;8;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:50] HOST ALERT: ash;DOWN;SOFT;7;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:33] HOST ALERT: ash;DOWN;SOFT;6;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:22] HOST ALERT: ash;DOWN;SOFT;5;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:01:38] HOST ALERT: ash;DOWN;SOFT;4;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:00:46] HOST ALERT: ash;DOWN;SOFT;3;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:00:30] HOST ALERT: ash;DOWN;SOFT;2;CRITICAL - 192.168.10.212: rta nan, lost 100%
Host Down[2014-01-17 12:00:18] HOST ALERT: ash;DOWN;SOFT;1;CRITICAL - 192.168.10.212: rta nan, lost 100%

I think the solution to this is a fresh deployment of Nagios 4.x with a vanilla nagios.cfg. But until I get to that, which will be a couple of weeks at least, I'm willing to try other tests if anyone can think of something.

Cheers!

Posted: **Mon Jan 20, 2014 10:32 am**

We are going to try to draw up plans for some more tests on your core system. I would agree at this point since it appears to be a mystery what is causing this, that you should roll out a new system and just copy the configurations, etc, over to mirror it.

Posted: **Wed Feb 12, 2014 7:45 pm**

Still haven't rolled out the newer version, but yes, that's the plan right now. But if in the mean time you'd like me to try anything else, that would be OK.

Thanks for all the effort =)

-t

Posted: **Thu Feb 13, 2014 10:23 am**

Hmm, what do you have in the "broker_module=" section of your nagios.cfg?

Nagios Support Forum

host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast

Re: host rechecks too fast