Anyone using Check_Cluster?
Posted: Mon May 02, 2011 10:53 am
I've been using the check_cluster plugin to monitor aggregate site availability for a while now. My monitoring is set up like this:
Host - mywebsite.com
> ServiceA - check_http > mywebsite
> ServiceB - check_http > mywebsite (via NRPE on machine in different location)
>ServiceC - check_cluster ServiceA and ServiceB.
This has been working flawlessly for a couple months. If website is down from either location I get a WARNING from check_cluster, from both locations CRITICAL. However over the weekend I got some really odd results coming in. The availability reports for that 24 hour period show ~99% uptime for A and B, but around 50% availability for the C check, across all hosts. It looks like we experienced some downtime, but the cluster check never realized the two component checks recovered.
Any advice what could be going on here?
Required Info:
NagiosXI 2009R1.3G on VM app.
Host - mywebsite.com
> ServiceA - check_http > mywebsite
> ServiceB - check_http > mywebsite (via NRPE on machine in different location)
>ServiceC - check_cluster ServiceA and ServiceB.
This has been working flawlessly for a couple months. If website is down from either location I get a WARNING from check_cluster, from both locations CRITICAL. However over the weekend I got some really odd results coming in. The availability reports for that 24 hour period show ~99% uptime for A and B, but around 50% availability for the C check, across all hosts. It looks like we experienced some downtime, but the cluster check never realized the two component checks recovered.
Any advice what could be going on here?
Required Info:
NagiosXI 2009R1.3G on VM app.