Page 1 of 1

Anyone using Check_Cluster?

Posted: Mon May 02, 2011 10:53 am
by jtata
I've been using the check_cluster plugin to monitor aggregate site availability for a while now. My monitoring is set up like this:

Host - mywebsite.com
> ServiceA - check_http > mywebsite
> ServiceB - check_http > mywebsite (via NRPE on machine in different location)
>ServiceC - check_cluster ServiceA and ServiceB.

This has been working flawlessly for a couple months. If website is down from either location I get a WARNING from check_cluster, from both locations CRITICAL. However over the weekend I got some really odd results coming in. The availability reports for that 24 hour period show ~99% uptime for A and B, but around 50% availability for the C check, across all hosts. It looks like we experienced some downtime, but the cluster check never realized the two component checks recovered.

Any advice what could be going on here?


Required Info:
NagiosXI 2009R1.3G on VM app.

Re: Anyone using Check_Cluster?

Posted: Mon May 02, 2011 11:36 am
by rdedon
Hello,
had anything changed prior to the odd results over the weekend?

Thank you.

Re: Anyone using Check_Cluster?

Posted: Mon May 02, 2011 12:20 pm
by jtata
Nothing I'm aware of. Just to be sure I went through the availability reports during out last downtime and didn't see any discrepancy between the site checks and the cluster check.

Re: Anyone using Check_Cluster?

Posted: Mon May 02, 2011 4:17 pm
by rdedon
Hmm, could you try restarting and see if it is monitoring normally after?

Re: Anyone using Check_Cluster?

Posted: Tue May 03, 2011 9:51 am
by jtata
My cluster checks were already reporting the correct state, they just took an entire day to recover after the individual services did. I did restart but not sure how to test if its working without causing something to go down.

Re: Anyone using Check_Cluster?

Posted: Tue May 03, 2011 10:12 am
by rdedon
Could you give this a read and see if it is applicable:
http://community.nagios.org/2009/06/18/ ... -ok-state/?