Page 1 of 1

Re-check with another nagios server if critical ?

Posted: Sun Oct 30, 2011 2:43 pm
by Mykeul
Hello,

I have a central nagios server, with 3 gearman workers.
When a check is "critical", I would like to check a second time with another worker (or nagios), to be sure it is not a local problem.

Is this possible ?

Thanks for your help

Re: Re-check with another nagios server if critical ?

Posted: Thu Nov 03, 2011 3:12 pm
by Mykeul
Hello,

I am amazed no one seems to do that ... it is important, isn't it ?
On a world distributed architecture, we can loose link bt the monitored host is up for other people on earth.

Please help :)

M

Re: Re-check with another nagios server if critical ?

Posted: Thu Nov 03, 2011 5:40 pm
by jsmurphy
I think the problem is you need to further explain your setup... I don't know what gearman is, how you have set it up or what your requirements are.

Taking a wild stab in the dark, you could set up a different check for each of the workers and then use service dependencies http://nagios.sourceforge.net/docs/3_0/ ... dependency to tell it to suppress alerts when the other checks are ok. Hopefully this helps :)

Re: Re-check with another nagios server if critical ?

Posted: Fri Nov 04, 2011 2:31 pm
by Mykeul
Hello,

Thanks for replying, I did not realize that my question was not enough explained and you are right.
In fact, forget the gearman workers, those are only nagios pollers.

To simplify, lets say I have 1 nagios in China and 1 in USA. I monitoring a server located in France.
Due to worldwide network latencies/problems, sometimes, the China nagios says the France is Down, but the USA nagios says it is OK. The reality is that the France server is OK

So, I would like the China server (or a master, or whatever) to ask the USA server (or a master, or whatever) to check the France server, and then change the state to critical (and notify) only when the 2 nagios say it is down.

I dont mind to change the plugins/nebs/scripts to fit the need, it is important.

Thanks for your help

Mykeul

Re: Re-check with another nagios server if critical ?

Posted: Sun Nov 06, 2011 6:49 pm
by jsmurphy
I think my preferred solution in this instance would be just to improve the fault tolerance, require that they do more retry checks or take longer between retry checks. But that really depends on how fast you need to react if there is a problem.

There is no easy way of accomplishing what you want, there isn't even a good way that I know of. If you really wanted to do this you could potentially jury rig a solution using NSCA, a passive service and event handlers... but you would probably be over-engineering a solution that may not really need it.