Page 1 of 1

Nagios Core - check_cluster

Posted: Fri Jun 15, 2012 7:13 am
by jmillan
Hello again, I receieved a request from a staff member to see if Nagios can monitor the up/down state of resources across the cluster? Here's his e-mail:

"We had an issue this morning where Microsoft cluster services failed underneath an MS-SQL cluster. Though the individual nodes were up, the cluster name was down, and services were impaired.

Is it possible to specify an alert when a DNS name doesn’t respond to pings without an agent on the host? In this case, the actual node’s IP will be different from that on which the cluster name responds?

Basically, is Nagios able to monitor the up/down state of resources across the cluster? For those, we’d want to know if a specific service or device was not available on either node (they could be on either node). It would be a normal condition for some services to only be running on one node at a time, and we wouldn’t want to be alerted every time one of those services was found down on the inactive node.
Basically, the related test could be something like “specific services must be running on either node, but not both” or “the shared data disk is not available on either node”.

There’s one symptom of the cluster services being down that can help us detect the situation encountered this morning --- when the cluster name doesn’t respond to pings, but individual nodes *do* respond. I think there’s still a need for “ping {cluster-name}” capability to know when the cluster name is no longer available. I would think it would be akin to checking for “up-ness” of a server that uses a virtual hostname."

End of e-mail.

I've looked up information on the check_cluster command, but what I've seen so far Nagios requires an agent installed on the individual hosts. I just wanted to confirm with you guys to make sure that's correct. Please let me know if you have any questions or need more information. Thanks.

Re: Nagios Core - check_cluster

Posted: Fri Jun 15, 2012 9:47 am
by sebastiaopburnay
Hi jmilan, there are a phew items on your post, I will try to address the first two.
Is it possible to specify an alert when a DNS name doesn’t respond to pings without an agent on the host?
Yes, I don't know if you ever consulted the oficial nagios-3.x core docs (http://nagios.sourceforge.net/docs/nagios-3.pdf), but a service like ICMP (protocol under which ping command is executed) is classified as a Publicly Available Service (see page 38); All publi services can be monitores without having to install an agent on the monitored host. About using its DNS name, you just need to ensure that your monitoring server can resolve that DNS name.
is Nagios able to monitor the up/down state of resources across the cluster?

From what I understand, you do not want to install agents on the hosts, than you are stuck in an environment in which monitoring is very limited and possibly, failrly inacurrate. Withowt the agent you won't get much things unless you use some other protocol/API/model to get remote data to the nagios server.

The oficial nagios-3.x core docs (http://nagios.sourceforge.net/docs/nagios-3.pdf) also mentions this need/solution, check page 253 [Monitoring Service and Host Clusters]. It does not seem complicated, yet I still haven't attempted to implement it. This kind of monitoring but it is an item I will soon include on my to-do list as well.

If that does not work for you either, than I guess you will have to use some kind of script customly developed to do your trick.

Greetings,
sebastiaopburnay

Re: Nagios Core - check_cluster

Posted: Fri Jun 15, 2012 11:01 pm
by jsmurphy
Furthermore to what sebastiaopburnay said, you may want to look into the Nagios BPI addon which is a "business process addon". What this allows you to do is create complex groupings for determining when you can consider something a problem... this document explains the full capabilities: http://assets.nagios.com/downloads/nagi ... _Addon.pdf You can get it from here: http://exchange.nagios.org/directory/Ad ... I)/details

The one check you may have trouble with is the "I want this service on one or the other but not both", with a little bit of ingenuity you can probably jury rig a service-dependency pointing in both directions to simulate this behaviour.

There are probably a few ways you can solve your problem with some time exploring the more advanced options and plugins available to Nagios.

Re: Nagios Core - check_cluster

Posted: Mon Jun 18, 2012 6:35 am
by jmillan
Thanks for the help! I'm looking through the documents now and I'll work with my staff to find out the best way to tackle this.