Disk Failover Monitor

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
wneville
Posts: 100
Joined: Wed Mar 31, 2021 3:35 pm

Disk Failover Monitor

Post by wneville »

Hello,

I have a group of hosts set up for db failovers. The db disks will only be running on the first server until it fails at which point those disks disappear from node 1 and come online on node 2. Is check_cluster the best way to monitor the availability of these db disks?

In my initial configuration, I have both nodes configured with the db disks. On the first node, the db disk services are working as expected. However, since those disks don't exist on node 2 until a failure on node 1, the db disk services are reading as "Unknown" on node 2. When node 1 goes down, the db disks read as "Unknown" on node 1 and get picked up on node 2.

The issue here is that the way we alert out to our event manager means that node 2 will be alerting out unnecessarily while node 1 is working properly. Can this be solved via check_cluster or should I look for another plugin?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Disk Failover Monitor

Post by ssax »

What type of DB cluster is it? Is it Windows/Linux or something else?

I would use check_cluster but there may be some specific plugins this once I get the info above and can search for them.

You could setup a BPI group in Home > BPI (or create a servicegroup for them and use the servicegroup in BPI) and set the warning threshold to 50, set the critical threshold to 0, then use Configure > Configuration Wizards > BPI Wizard to monitor the status of that BPI group to do the notifying for the group of services (you can disable notifications on the individual services if you don't want them to notify).

https://assets.nagios.com/downloads/nag ... BPI_v2.pdf

You could also do it with check_cluster:

1. Make sure that you are monitoring both services on the server (you can disable notifications for them, this is important so you don't get notifications when they are down), these service checks are what will be used by the check_cluster plugin and need to exist.

2. Create a new command:
- Command Name: check_service_cluster
- Command Line: $USER1$/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d '$ARG4$'
- Command Type: check command

3. Create the service cluster check:
- Description: Service_Cluster
- Check command: check_service_cluster
- $ARG1$: Service_Cluster
- $ARG2$: 3 <- Set this to one MORE than your total number of services (2 services + 1 = 3) - this is to ignore warnings, can adjust if you want warnings
- $ARG3$: 0 <- Set this to zero for cases like this (both not running)
- $ARG4$: $SERVICESTATEID:yourhost1:servicedesc$,$SERVICESTATEID:yourhost2:servicedesc$

NOTE: The hostname and the service description in $ARG4$ need to be exact (case sensitive).

The way this would work is that whenever both services are not running on that host it would generate a CRITICAL. So the check_cluster uses the statuses of all of each individual service checks to determine if there is an issue and since you disabled the notifications on the individual services you won't get those notifications, this is the service that will do the notifying.

Please read here for more information:

https://assets.nagios.com/downloads/nag ... sters.html
wneville
Posts: 100
Joined: Wed Mar 31, 2021 3:35 pm

Re: Disk Failover Monitor

Post by wneville »

The BPI seems to work well - can BPI configs be written from the command line? We have Linux DB failover pairs (a lot) so it will be much quicker to configure them from there vs XI
wneville
Posts: 100
Joined: Wed Mar 31, 2021 3:35 pm

Re: Disk Failover Monitor

Post by wneville »

Got it!

/usr/local/nagiosxi/etc/components/bpi.conf

I will explore this but for the time being it seems like a really good solution. Thanks so much!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Disk Failover Monitor

Post by ssax »

Yep, that's the file, good find! Let us know if you have any questions.
Locked