Monitor service cluster

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
amitw
Posts: 28
Joined: Tue Jun 28, 2016 8:07 am

Monitor service cluster

Post by amitw »

Hi,

I have a 3 servers in a cluster that has the same service being monitored on all 3 servers but the service is running only on 1 server at a time.

I'm looking for a plugin that can monitor those services and will alert only if the service is not running at all.

The minimum and maximum is 1 service on 1 server( doesn't matter on which server, the service is running).

I looked at check_cluster plugin but it not eliminates the alerts from the services that are down.

Will appreciate your help.

Thanks,
Amit
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Monitor service cluster

Post by cdienger »

Are you actually looking to disable alerts for these services in a cluster or notifications? Alerts are what you see in the web UI under Reports > Alerts and in nagios.log. Notifications are the actual email or sms messages that get sent to people. I don't think there's anyway to stop alerts but notifications should be configurable.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
amitw
Posts: 28
Joined: Tue Jun 28, 2016 8:07 am

Re: Monitor service cluster

Post by amitw »

Hi,
I'm using "Dependencies" for cluster with 2 services.
Clusters with 3 services and above, is not supportable with "Dependencies"

the "Check_Cluster" plugin is actually looking on services that are being checked on Nagios.
I'm looking for a way to check my 3 services cluster in 1 single check.

Is there a way to achieve it?

Thanks
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Monitor service cluster

Post by ssax »

You have two options here:

1. Write your own plugin that runs all 3 checks in a single check and outputs the correct exit code / output that you want.

If you decide to go this route you can read here for more information:

https://assets.nagios.com/downloads/nag ... inapi.html

And here:

https://nagios-plugins.org/doc/guidelines.html

OR

You can use the check_cluster plugin, let me show you how:

1. Make sure that you are monitoring the services (PING in this example) on all servers (you can disable notifications for them, this is important so you don't notifications when they are down), these service checks are what will be used by the check_cluster plugin and need to exist.

2. Create a new command:
- Command Name: check_service_cluster
- Command Line: $USER1$/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d '$ARG4$'
- Command Type: check command

3. Create the service cluster check:
- Description: PING_Cluster
- Check command: check_service_cluster
- $ARG1$: PING_Cluster
- $ARG2$: 4 <- Set this to one MORE than your total number of services (3 services + 1 = 4) - We don't care about warnings in this example
- $ARG3$: 2 <- Set this to one LESS than your total number of services (3 services - 1 = 2)
- $ARG4$: $SERVICESTATEID:yourhost1:PING$,$SERVICESTATEID:yourhost2:PING$,$SERVICESTATEID:yourhost3:PING$

NOTE: The hostname and the service description in $ARG4$ need to be exact (case sensitive).

The way this would work is that whenever that service is not running on ANY of the nodes it would generate a CRITICAL. So the check_cluster uses the statuses of all of each individual service checks to determine if there is an issue and since you disabled the notifications on the individual services you won't get those notifications, this is the service that will do the notifying.

Please read here for more information:

https://assets.nagios.com/downloads/nag ... sters.html
Locked