Page 1 of 1

Host's current attempt goes to 1 when in hard state

Posted: Tue Jun 03, 2014 4:27 pm
by niebais
We've found that when a host goes into a hard state that the current_attempt will change to 1 just after going into a hard state. Can you confirm that this is a bug? Is there a setting that triggers this behavior? It doesn't happen for services. This happens in nagios3 and nagios4.

The example below shows the host's attempts increasing. Once the it hits the hard state, the current_attempts goes to 1 after the next check.

Command used to view the state:

Code: Select all

[brianc@xi1 ~]$ echo -e "GET hosts\nColumns: host_name current_attempt max_check_attempts state state_type hard_state\n" | /usr/local/bin/unixcat 
The outputs:

Code: Select all

child1;1;5;1;0;0
child1;2;5;1;0;0
child1;3;5;1;0;0
child1;4;5;1;0;0
child1;5;5;1;1;1
child1;1;5;1;1;1

Re: Host's current attempt goes to 1 when in hard state

Posted: Wed Jun 04, 2014 12:51 pm
by tmcdonald
That looks like expected behavior according to this doc:

http://nagios.sourceforge.net/docs/3_0/statetypes.html

Re: Host's current attempt goes to 1 when in hard state

Posted: Wed Jun 04, 2014 5:25 pm
by niebais
Interesting. That table seems to line up with what I see for hosts but not for services.

ex.

Command:

Code: Select all

[root@xi1 brianc]# echo -e "GET services\nColumns: host_name service_description current_attempt max_check_attempts state state_type\n" | /usr/local/bin/unixcat /usr/local/nagios/var/rw/live | grep "^child2;depend1"
Results:

Code: Select all

child2;depend1;1;2;2;0
child2;depend1;2;2;2;1
child2;depend1;2;2;2;1
child2;depend1;2;2;2;1

Re: Host's current attempt goes to 1 when in hard state

Posted: Thu Jun 05, 2014 1:04 pm
by slansing
Is the above behavior happening when the service's dependent host is in a down state? If so, this may offer insight:

As always, there are exceptions to the rules. When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is up (see the note below for info on how this is done). If the host is not up (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1. Since the service is in a hard non-OK state, the service check will be rescheduled at the normal frequency specified by the check_interval option instead of the retry_interval option.


Is this happening across the board? What happens if you submit a passive up state to one of the hosts showing this behavior on it's service's and then disable active checking on that host to keep it locked in that state?

Re: Host's current attempt goes to 1 when in hard state

Posted: Thu Jun 05, 2014 2:58 pm
by niebais
The parent is in an UP state.

I turned off active checks on the host and submitted a passive UP to the host and I get the same behavior on the services.

Do you see this on your side? I see this on multiple instances of nagios -- there isn't a system where I haven't seen this behavior. I prefer the behavior of the services where the current attempt stays at the max attempts when it goes into a hard state. I want to know the reasoning for the the hosts current attempt going to 1. It's not consistent with services which is why we and our customers have noticed it.

Do you know where about in the code this is happening -- looking for a starting point/hint? I can debug it and try to get some more information.

Thanks for your help!

Re: Host's current attempt goes to 1 when in hard state

Posted: Thu Jun 05, 2014 3:09 pm
by scottwilkerson
I don't have livestatus installed but do see the same in the UI.

Host
host.PNG
Service
service.PNG
To be honest, I've never noticed this before, and don't know that I would have if you didn't mention it.

I can file a bug report and have the Core developers take a look at it.

Re: Host's current attempt goes to 1 when in hard state

Posted: Thu Jun 05, 2014 3:59 pm
by niebais
Cool. I originally saw it in the UI/XI, but was using livestatus for the report.

We discovered this by looking at the Hosts Details pages and wondering why the hosts were critical with an attempt of 1/5. We thought something was up until we saw that the state history was correct. I know that that current attempt doesn't always correlate to a hard state (ex. a service's host is down or dependencies) but this one seemed off.

Thanks for submitting the bug and looking into this.

Re: Host's current attempt goes to 1 when in hard state

Posted: Fri Jun 06, 2014 9:23 am
by slansing
We'll try to get in contact when it is fixed, or when we have additional information, this thread's address should be in the the bug report.