host rechecks too fast

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: host rechecks too fast

Post by slansing »

You can try switching your max_host_check_spread to 30 minutes instead of 1 minute in the nagios.cfg and then restart nagios. This is what that value does:
This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. Default value is 30 (minutes).
pfarthing6
Posts: 8
Joined: Wed Nov 27, 2013 7:53 pm

Re: host rechecks too fast

Post by pfarthing6 »

Giving that a shot. Will post results soon. Thanks!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: host rechecks too fast

Post by slansing »

Great! Keep us up to date.
pfarthing6
Posts: 8
Joined: Wed Nov 27, 2013 7:53 pm

Re: host rechecks too fast

Post by pfarthing6 »

No joy I'm afraid. The change to that setting had no effect whatsoever.

The expectation with settings below would be that after SOFT DOWN 1, Nagios should retry every 2 min. Instead it retried 8 times within a period of about 3 minutes.
Test host: Ash. Test Setings: check_interval 1, retry_interval 2, max_check_attempts 8

Host Down[2014-01-17 12:03:37] HOST ALERT: ash;DOWN;HARD;8;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:50] HOST ALERT: ash;DOWN;SOFT;7;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:33] HOST ALERT: ash;DOWN;SOFT;6;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:02:22] HOST ALERT: ash;DOWN;SOFT;5;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:01:38] HOST ALERT: ash;DOWN;SOFT;4;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:00:46] HOST ALERT: ash;DOWN;SOFT;3;CRITICAL - 192.168.10.212: Host unreachable @ 192.168.10.161. rta nan, lost 100%
Host Down[2014-01-17 12:00:30] HOST ALERT: ash;DOWN;SOFT;2;CRITICAL - 192.168.10.212: rta nan, lost 100%
Host Down[2014-01-17 12:00:18] HOST ALERT: ash;DOWN;SOFT;1;CRITICAL - 192.168.10.212: rta nan, lost 100%

I think the solution to this is a fresh deployment of Nagios 4.x with a vanilla nagios.cfg. But until I get to that, which will be a couple of weeks at least, I'm willing to try other tests if anyone can think of something.

Cheers!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: host rechecks too fast

Post by slansing »

We are going to try to draw up plans for some more tests on your core system. I would agree at this point since it appears to be a mystery what is causing this, that you should roll out a new system and just copy the configurations, etc, over to mirror it.
pfarthing6
Posts: 8
Joined: Wed Nov 27, 2013 7:53 pm

Re: host rechecks too fast

Post by pfarthing6 »

Still haven't rolled out the newer version, but yes, that's the plan right now. But if in the mean time you'd like me to try anything else, that would be OK.

Thanks for all the effort =)

-t
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: host rechecks too fast

Post by slansing »

Hmm, what do you have in the "broker_module=" section of your nagios.cfg?
Locked