Page 1 of 1

check_cluster plugin question

Posted: Fri Jan 20, 2012 3:37 pm
by globalive.nagios
I've spent a good hour or so trying to figure it out, maybe you guys know the answer.

The check_cluster plugin is defined like this:

Code: Select all

check_cluster -s/-h -w -c -d
For example, I have a cluster with three nodes (-h), and I want to go critical if only zero or one are 'ok' (i.e. one node is down).

Code: Select all

check_cluster -h -c 0:1 -d 0,0,0
That's all well and good, but for services, you have the same options (assuming three services):

Code: Select all

check_cluster -s -c 0:1 -d 0,0,0
What if I only want to alarm if ONE of those services is down - a SPECIFIC service...the data points (-d) options only provide the total number, doesn't seem to be a way to specify something.

For example, if I am monitoring nodes, I can define -d however I want:

Code: Select all

-d 0,0,0 == -d node1.test.com,node2.test.com,node3.test.com
The plugin doesn't care what the datapoint is called. Is there any way to make it care?

Re: check_cluster plugin question

Posted: Fri Jan 20, 2012 3:54 pm
by scottwilkerson
globalive.nagios wrote:For example, if I am monitoring nodes, I can define -d however I want:

Code: Select all

-d 0,0,0 == -d node1.test.com,node2.test.com,node3.test.com
The plugin doesn't care what the datapoint is called. Is there any way to make it care?
Actually, the it does matter what you put in the datapoint, it is looking for a status code, and they are integers...

Going back to your example

Code: Select all

 # ./check_cluster -s -c 0:1 -d 0,0,0
CLUSTER OK: Service cluster: 3 ok, 0 warning, 0 unknown, 0 critical
3 ok, because 0 is the code for ok.
If we change the datapoints to be a different status code lest say a 1, and another a 2

Code: Select all

 #./check_cluster -s -c 0:1 -d 0,1,2
CLUSTER CRITICAL: Service cluster: 1 ok, 1 warning, 0 unknown, 1 critical
See how we now have 1 ok, 1 warning, 0 unknown, 1 critical... The status codes can be found here
http://nagiosplug.sourceforge.net/devel ... lines.html

Re: check_cluster plugin question

Posted: Fri Jan 20, 2012 4:03 pm
by globalive.nagios
Okay, definitely confused now. If you manually specify status...what's the point of the plugin?

If I cannot specify mysql_servixe as a datapoint which the plugin then checks against and returns a code which is calclulated against the threshold I set, why use this plugin?

Sorry, very confused!

Re: check_cluster plugin question

Posted: Fri Jan 20, 2012 6:27 pm
by globalive.nagios
I should clarify my previous response...was on the train w. BB.


We have a mysql HA cluster service called mysql_service that runs on two nodes. I know that's what it is called because running 'clustat' returns that under the service name field (and that's the configured name in luci, too).

From NagiosXI the service is defined as 'check_nrpe$hostname$check_mysql' (not exact, but you get the idea).

On the cluster nodes I have a command defined:

Code: Select all

command[check_mysql]=/usr/lib64/nagios/plugins/check_cluster -s -c 0:0 -d mysql_service
That outputs an OK, because the service is running, making the total 1 - only go critical if the total is 0, right?

Well this command does the exact same thing:

Code: Select all

command[check_mysql]=/usr/lib64/nagios/plugins/check_cluster -s -c 0:0 -d 0
See what I mean? The '-d' portion only cares that SOMETHING is separated by commas, doesn't matter what.

To go back to my original question, the following two commands provide identical results:

Code: Select all

command[check_mysql]=/usr/lib64/nagios/plugins/check_cluster -s -c 0:0 -d mysql_service,goober_service,test_service

Code: Select all

command[check_mysql]=/usr/lib64/nagios/plugins/check_cluster -s -c 0:0 -d 0,0,0
It actually doesn't even matter if the former services are REAL, just requires something present separated by commas. Further, if I have those three services running on a single cluster, this means there is no way to just monitor one. I suppose there is a valid argument for 'you want to know if ANY cluster service goes down', but you get what I'm asking.

So, am I completely missing the point of this plugin, missing some syntax, or is there an issue here? :)

Re: check_cluster plugin question

Posted: Fri Jan 20, 2012 10:20 pm
by scottwilkerson
globalive.nagios wrote:If I cannot specify mysql_servixe as a datapoint which the plugin then checks against and returns a code which is calclulated against the threshold I set, why use this plugin?
You can, but just have to do it the right way... See Below.
globalive.nagios wrote:It actually doesn't even matter if the former services are REAL, just requires something present separated by commas.
It actually does matter what you have there, if you place something random there it's state won't change, but if we place a macro there it can change if the state of the host or service changes...
globalive.nagios wrote:So, am I completely missing the point of this plugin, missing some syntax, or is there an issue here?
I think a little bit of both.... :D

The items that follow the -d would usually be the result if service checks of some sort that you would be passing in, not a host name.

./check_cluster -s -c 0:1 -d $ARG1$,$ARG2$,$ARG3$

From the Nagios XI Interface you might put something like this in the fields:
$ARG1$ = $SERVICESTATEID:host_name1:service_description$
$ARG2$ = $SERVICESTATEID:host_name2:service_description$
$ARG3$ = $SERVICESTATEID:host_name3:service_description$

You may understand this a little better if you take a look at the Understanding Nagios Macros and How They Work

Re: check_cluster plugin question

Posted: Tue Jan 24, 2012 9:33 am
by globalive.nagios
Ahhh, so my real issue here is simply that this command isn't meant to be run locally, but from the Nagios server with arguments! Silly me.

I will take a look at that link and give it another shot. Thanks for your patience!

Re: check_cluster plugin question

Posted: Tue Jan 24, 2012 12:29 pm
by scottwilkerson
no problem.