I've been using the check_cluster plugin to monitor aggregate site availability for a while now. My monitoring is set up like this:
Host - mywebsite.com
> ServiceA - check_http > mywebsite
> ServiceB - check_http > mywebsite (via NRPE on machine in different location)
>ServiceC - check_cluster ServiceA and ServiceB.
This has been working flawlessly for a couple months. If website is down from either location I get a WARNING from check_cluster, from both locations CRITICAL. However over the weekend I got some really odd results coming in. The availability reports for that 24 hour period show ~99% uptime for A and B, but around 50% availability for the C check, across all hosts. It looks like we experienced some downtime, but the cluster check never realized the two component checks recovered.
Any advice what could be going on here?
Required Info:
NagiosXI 2009R1.3G on VM app.
Anyone using Check_Cluster?
Re: Anyone using Check_Cluster?
Hello,
had anything changed prior to the odd results over the weekend?
Thank you.
had anything changed prior to the odd results over the weekend?
Thank you.
Re: Anyone using Check_Cluster?
Nothing I'm aware of. Just to be sure I went through the availability reports during out last downtime and didn't see any discrepancy between the site checks and the cluster check.
Re: Anyone using Check_Cluster?
Hmm, could you try restarting and see if it is monitoring normally after?
Re: Anyone using Check_Cluster?
My cluster checks were already reporting the correct state, they just took an entire day to recover after the individual services did. I did restart but not sure how to test if its working without causing something to go down.
Re: Anyone using Check_Cluster?
Could you give this a read and see if it is applicable:
http://community.nagios.org/2009/06/18/ ... -ok-state/?
http://community.nagios.org/2009/06/18/ ... -ok-state/?