Page 1 of 1

Master/slave process monitornig

Posted: Wed Jun 30, 2021 11:18 pm
by vishal313
Hi All,

We are monitoring processes using check_procs plugin from Nagios XI. We are connecting to the server using check_ssh.
We have a requirement to monitor two process on one server with master/slave configuration where either of the process will be running and we need to trigger an alarm if none of them are running at a particular time.

Is there a way we can achieve this.


Regards
Vishal Dhote

Re: Master/slave process monitornig

Posted: Thu Jul 01, 2021 10:49 am
by ssax
You could setup a BPI group in Home > BPI (or create a servicegroup for them and use the servicegroup in BPI) and set the warning threshold to 50, set the critical threshold to 0, then use Configure > Configuration Wizards > BPI Wizard to monitor the status of that BPI group to do the notifying for the group of services (you can disable notifications on the individual services if you don't want them to notify).

https://assets.nagios.com/downloads/nag ... BPI_v2.pdf

You could also do it with check_cluster:

1. Make sure that you are monitoring both services on the server (you can disable notifications for them, this is important so you don't get notifications when they are down), these service checks are what will be used by the check_cluster plugin and need to exist.

2. Create a new command:
- Command Name: check_service_cluster
- Command Line: $USER1$/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d '$ARG4$'
- Command Type: check command

3. Create the service cluster check:
- Description: Service_Cluster
- Check command: check_service_cluster
- $ARG1$: Service_Cluster
- $ARG2$: 3 <- Set this to one MORE than your total number of services (2 services + 1 = 3) - this is to ignore warnings, can adjust if you want warnings
- $ARG3$: 0 <- Set this to zero for cases like this (both not running)
- $ARG4$: $SERVICESTATEID:yourhost1:SERVICE1$,$SERVICESTATEID:yourhost1:SERVICE2$

NOTE: The hostname and the service description in $ARG4$ need to be exact (case sensitive).

The way this would work is that whenever both services are not running on that host it would generate a CRITICAL. So the check_cluster uses the statuses of all of each individual service checks to determine if there is an issue and since you disabled the notifications on the individual services you won't get those notifications, this is the service that will do the notifying.

Please read here for more information:

https://assets.nagios.com/downloads/nag ... sters.html