Page 2 of 3
Re: External Commands Logic.
Posted: Mon May 06, 2013 5:01 pm
by samuel
My intention is to make it so that if that ping service is down and the host up then enable notifications. But if the host goes down as well then disable notifications. I need the host to overule the service. which it does if the service goes down first. the problem is when nagios actively checks it the host might be checked first and then the service.
Re: External Commands Logic.
Posted: Tue May 07, 2013 1:40 pm
by abrist
You may have to rework the logic as there is no good way in nagios to force checks to be scheduled before others. Is the service check an ICMP ping check to the host behind the router?
Re: External Commands Logic.
Posted: Wed May 08, 2013 10:16 am
by samuel
In the real situation it will be, but currently I am testing this specific occurrence, where the host and the service is down with a host and a ping service that pings that host. It makes it easier to test when they are both down.
There are four different situations that can happen.
nagios1 goes down then router1 stays up.
nagios1 goes down then router1 goes down.
router1 goes down but naigos1 is up.
router1 goes down then nagios1 goes down.
Re: External Commands Logic.
Posted: Thu May 09, 2013 9:51 am
by scottwilkerson
samuel wrote: I need the host to overule the service. which it does if the service goes down first.
This is all usually take care of automatically if your max_check_attempts * retry_interval is as long or longer than the regular check interval of the hosts...
Re: External Commands Logic.
Posted: Fri May 10, 2013 9:06 am
by samuel
Currently the max check attempts and retry interval are set to one.
That is just because of the test.
Usually max check is 5.
retry is 1.
So your fix to it would be to make either the max check attempts for the host longer or the max check attempts for the service shorter?
Re: External Commands Logic.
Posted: Fri May 10, 2013 3:37 pm
by sreinhardt
I think you want to invert that logic. The host checks should be in shorter interval than service retry*attempts, to allow for the host to stop alerts on a service.
Re: External Commands Logic.
Posted: Fri May 10, 2013 4:28 pm
by samuel
It would stop the alerts but if the service goes into a critical state after the host goes down it would enable alerts.
Re: External Commands Logic.
Posted: Fri May 10, 2013 4:32 pm
by samuel
Would this work
service:
check interval: 5
retry interval: 1
max checks: 3
host:
check interval: 5
retry interval: 1
max checks: 5
It gives the host two more minutes before going into hard state.
Should I make it so that its at least five or seven minutes more before going into hard state?
To make sure that the majority of the time it is going to go hard state last.
Re: External Commands Logic.
Posted: Mon May 13, 2013 10:34 am
by slansing
If you would rather have the service's go into a hard state before the host then yes the above would work in that scenario, it's really down to how you want to be alerted, and when.
Re: External Commands Logic.
Posted: Mon May 13, 2013 12:09 pm
by samuel
The aim is to have as few duplicate alerts as possible.
If the router goes down then I don't want to be alerted on all the unreachable hosts on the other side.
With this logic I can make that situation happen.
While making sure that if one nagios server goes down it will be picked up by the other.