Page 1 of 1

check_cluster service always returns OK

Posted: Wed May 04, 2016 6:16 am
by Rishi_igt
We are trying to enable some check for cluster managing of service, we have enabled below service :-

[root@pp-mon01 libexec]# ./check_cluster -c 0 -d $SERVICESTATEID:pp-app01:eta_agent$,$SERVICESTATEID:pp-app02:eta_agent$
CLUSTER OK: Service cluster: 2 ok, 0 warning, 0 unknown, 0 critical

and it always return " OK ", even if I give any dummy name xyz or any services name it doesn't matter the result id always OK below is for reference

[root@pp-mon01 Liberec]# ./check_cluster -c 0 -d $SERVICESTATEID:XXXX:eta_agent$,$SERVICESTATEID:pp-app02:eta_agent$
CLUSTER OK: Service cluster: 2 ok, 0 warning, 0 unknown, 0 critical
[root@pp-mon01 libexec]# ./check_cluster -c 0 -d $SERVICESTATEID:XXXX:eta_agent$,$SERVICESTATEID:pp-app02:XXXXXXX$
CLUSTER OK: Service cluster: 2 ok, 0 warning, 0 unknown, 0 critical
[root@pp-mon01 libexec]#

Re: check_cluster service always returns OK

Posted: Wed May 04, 2016 2:14 pm
by ssax
You cannot use the nagios macros from the command line which means that the check_cluster plugin will not work from the command line.

Is it not working for you when you put it into XI and apply config?

I'm copying and pasting from a ticket that I had, it should give you the general idea.

1. Make sure that you are monitoring the services (PING in this example) on all servers (you can disable notifications for them), these service checks are what will be used by the check_cluster plugin and need to exist.

2. Create a new command:
- Command Name: check_service_cluster
- Command Line: $USER1$/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d '$ARG4$'
- Command Type: check command

3. Create the service cluster check:
- Description: PING_Cluster
- Check command: check_service_cluster
- $ARG1$: PING_Cluster
- $ARG2$: 0
- $ARG3$: 1
- $ARG4$: $SERVICESTATEID:yourhost1:PING$,$SERVICESTATEID:yourhost2:PING$,$SERVICESTATEID:yourhost3:PING$

NOTE: The hostname and the service description in $ARG4$ need to be exact (case sensitive).

The way this would work is that whenever that service is not running on any of those servers it would generate a critical.


Let me know if you have any questions.