Hey guys,
One of my users raised an interesting question today that I don't quite know the answer to so I could use some clarification. What is supposed to happen if you set triggered downtime for "now" on a host and all child hosts if the host is already down?
In this case the host went into scheduled downtime but the child hosts did not, I'm not sure if this is a bug or intended behaviour... because technically there was no state change to "trigger" downtime, which makes it seem like it's intended behaviour. In this particular case the user should have used non-triggered downtime anyway, but it does raise an interesting point about the semantics.
Thanks!
Triggered downtime
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Triggered downtime
Actually, from my understanding is the "trigger" in triggered downtime refers to downtime for a particular host "triggering" downtime for its children.
It does not refer to downtime starting when the host first goes down, that would fall under "flexible" downtime vs. fixed.
In flexible downtime, (with or without triggers) the downtime WILL NOT start (nor will it a trigger child's downtime) until the host goes from an OK state to a DOWN state. If it is already DOWN it will not start the downtime.
This is expected behavior, and if the host is already DOWN you should not use flexible downtime as it is designed for cases when you know the host is up and are unsure of when it is going to go down, and want to start the downtime timer from the point it goes down.
It does not refer to downtime starting when the host first goes down, that would fall under "flexible" downtime vs. fixed.
In flexible downtime, (with or without triggers) the downtime WILL NOT start (nor will it a trigger child's downtime) until the host goes from an OK state to a DOWN state. If it is already DOWN it will not start the downtime.
This is expected behavior, and if the host is already DOWN you should not use flexible downtime as it is designed for cases when you know the host is up and are unsure of when it is going to go down, and want to start the downtime timer from the point it goes down.
Re: Triggered downtime
I may have poorly worded the question.
The options were set as:
Type: Fixed
Child Hosts: Schedule triggered downtime for all child hosts
The host was already down when the downtime was set, the host entered scheduled downtime (as expected) but none of the children did. The host its self is not the concern in this question... it operated as you would expect, the child hosts are the concern.
If you set "Schedule non-triggered downtime for all child hosts" all of the child hosts will enter downtime, again, as expected. However if the target host is already down and you set "Schedule triggered downtime for all child hosts", the host itself will enter downtime but none of the children?
I think maybe that will clarify the question?
The options were set as:
Type: Fixed
Child Hosts: Schedule triggered downtime for all child hosts
The host was already down when the downtime was set, the host entered scheduled downtime (as expected) but none of the children did. The host its self is not the concern in this question... it operated as you would expect, the child hosts are the concern.
If you set "Schedule non-triggered downtime for all child hosts" all of the child hosts will enter downtime, again, as expected. However if the target host is already down and you set "Schedule triggered downtime for all child hosts", the host itself will enter downtime but none of the children?
I think maybe that will clarify the question?
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Triggered downtime
Hmm, that does clarify the question, but I just did a couple tests with this exact scenario, and the child hosts go into downtime right away...
This is in 2012R2.0 XI with Core 3.5.0 underneath the hood.
Not sure what version you are running, but 2012R2.0+ of XI has a patch version of Core that applied the following (which could be related)
This is in 2012R2.0 XI with Core 3.5.0 underneath the hood.
Not sure what version you are running, but 2012R2.0+ of XI has a patch version of Core that applied the following (which could be related)
- Patched Nagios Core 3.5.0 Fixed bug #445: Adding triggered downtime for child hosts causes a SIGSEGV on restart/reload (Eric Stanley) -SW
Re: Triggered downtime
At the time the user noticed this we were running r1.7... over the weekend we upgraded to r2.2, so it could very well be related to that bug. That answers my question though about how it should be handled, so I'll be sure to let you know if I ever see it again.
Thanks Scott!
Thanks Scott!