nscott wrote:Thats a good point, i'll look into that. But are your problems resolved?
Kinda? Not really? It appears to be a timing problem - services scheduled for check on hosts behind a router before the router itself is checked. As soon as the router is in 'critical', the remaining hosts/services behind it are not checked. I the case of an outage this afternoon, I got ten (out of 24) alerts before the router outage stopped the flow.
Ok. Now thats expected logic, that may seem broken at first, but it would cause quite a bit of stress on the target system if every time a service returned critical it ran a check on the host before sending a notification. A way around this problem would be to increment the amount of times the service check must return critical before sending a notification.
I have the 'far hosts' checking every five minutes and the routers every one minute (there are only 34 of them). Router goes critical in five minutes and 'far hosts' ten. We'll see how that goes.