Page 1 of 2

reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 12:46 pm
by mckslim
Running 3.4.1:
I see this strange anomaly, where a host check is in the middle of doing retries before hitting max_attempts, but after a server reload occurs, the next check is automatically forced to DOWN;HARD;1, as seen here:

[2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''.
[2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''.
[2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''.
(reload happens here at 09:00)
[2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''.

Why is it skipping the rest of the attempts and going straight to DOWN;HARD after the reload ?
Seems like a bug to me.

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 2:10 pm
by abrist
Do you have "initial_state" set on the object?

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 3:01 pm
by mckslim
'initial_state' is not set to anything
Remember that this is happening on a reload (I haven't examined what happens on a full restart).
thanks

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 4:08 pm
by abrist
Was this host configured for downtime? I ask because there were a number of bugs related to flexible downtime and hard states.

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 4:17 pm
by mckslim
no scheduled downtime has been in effect for this problem

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 5:04 pm
by abrist
How many check attempts are set on this check?
Do you have more than 1 nagios parent process running?

Code: Select all

ps -aef | grep nagios.cfg

Re: reload appears to cause skip of remaining attempts

Posted: Mon Jun 17, 2013 6:02 pm
by mckslim
max_attempts is 4

output:
$ ps -aef | grep nagios.cfg
nagios 5726 1 0 Jun14 ? 00:41:14 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25239 5726 0 22:58 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25330 5726 0 22:58 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25436 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25444 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25446 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25448 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25465 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25492 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25533 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25544 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25563 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
nagios 25615 5726 0 22:59 ? 00:00:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg

Re: reload appears to cause skip of remaining attempts

Posted: Tue Jun 18, 2013 1:12 pm
by abrist
The process list looks fine as they are all children of the same parent process. Do you only experience this issues when restarting nagios?

Re: reload appears to cause skip of remaining attempts

Posted: Tue Jun 18, 2013 3:23 pm
by mckslim
I only find this happening when Nagios is reloaded.

Re: reload appears to cause skip of remaining attempts

Posted: Tue Jun 18, 2013 3:31 pm
by abrist
I would suggest opening a bug with tracker.nagios.org , but you should probably first think about updating to the newest version and give that a go before you file the bug.