Nagios Core - check_cluster
Posted: Fri Jun 15, 2012 7:13 am
Hello again, I receieved a request from a staff member to see if Nagios can monitor the up/down state of resources across the cluster? Here's his e-mail:
"We had an issue this morning where Microsoft cluster services failed underneath an MS-SQL cluster. Though the individual nodes were up, the cluster name was down, and services were impaired.
Is it possible to specify an alert when a DNS name doesn’t respond to pings without an agent on the host? In this case, the actual node’s IP will be different from that on which the cluster name responds?
Basically, is Nagios able to monitor the up/down state of resources across the cluster? For those, we’d want to know if a specific service or device was not available on either node (they could be on either node). It would be a normal condition for some services to only be running on one node at a time, and we wouldn’t want to be alerted every time one of those services was found down on the inactive node.
Basically, the related test could be something like “specific services must be running on either node, but not both” or “the shared data disk is not available on either node”.
There’s one symptom of the cluster services being down that can help us detect the situation encountered this morning --- when the cluster name doesn’t respond to pings, but individual nodes *do* respond. I think there’s still a need for “ping {cluster-name}” capability to know when the cluster name is no longer available. I would think it would be akin to checking for “up-ness” of a server that uses a virtual hostname."
End of e-mail.
I've looked up information on the check_cluster command, but what I've seen so far Nagios requires an agent installed on the individual hosts. I just wanted to confirm with you guys to make sure that's correct. Please let me know if you have any questions or need more information. Thanks.
"We had an issue this morning where Microsoft cluster services failed underneath an MS-SQL cluster. Though the individual nodes were up, the cluster name was down, and services were impaired.
Is it possible to specify an alert when a DNS name doesn’t respond to pings without an agent on the host? In this case, the actual node’s IP will be different from that on which the cluster name responds?
Basically, is Nagios able to monitor the up/down state of resources across the cluster? For those, we’d want to know if a specific service or device was not available on either node (they could be on either node). It would be a normal condition for some services to only be running on one node at a time, and we wouldn’t want to be alerted every time one of those services was found down on the inactive node.
Basically, the related test could be something like “specific services must be running on either node, but not both” or “the shared data disk is not available on either node”.
There’s one symptom of the cluster services being down that can help us detect the situation encountered this morning --- when the cluster name doesn’t respond to pings, but individual nodes *do* respond. I think there’s still a need for “ping {cluster-name}” capability to know when the cluster name is no longer available. I would think it would be akin to checking for “up-ness” of a server that uses a virtual hostname."
End of e-mail.
I've looked up information on the check_cluster command, but what I've seen so far Nagios requires an agent installed on the individual hosts. I just wanted to confirm with you guys to make sure that's correct. Please let me know if you have any questions or need more information. Thanks.