NRPE: Check drive size in active/passive cluster

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

NRPE: Check drive size in active/passive cluster

Post by mhixson2 »

Hi guys!

Environment:
Nagios XI 2014R2.7
NSClient++ 0.4.3.143,
Mostly NRPE checks

We have several active/passive windows clusters that serve various roles (file shares, MS SQL, etc.). I am setting up NRPE checks per what I've read online, which is to direct any check for a shared service to the cluster hostname/vip instead of either of the nodes. Here are my questions/issues:

1. Is this the only way to follow shared services between nodes? Scrap that - what is the best way to do this?
2. I have no issues with check_cpu or check_service when pointed at the cluster, but check_drivesize is giving me trouble. I should actually be good to go if I can get this to work. Here is the command I'm running on the Nagios server to test the check:

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize
Pretty simple. Here's what it returns:

Code: Select all

CHECK_NRPE: Invalid packet type received from server.
Googling led me to suggestions for enabling arguments or other changes in the nsclient.ini, but all that's already enabled. The same nsclient.ini config and this check work fine on any other server.
If I try

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize -a drive=*
Same Invalid packet type error.
If I try (local OS drive)

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize -a drive=c:
Success!

Code: Select all

OK All 1 drive(s) are ok|'c: used'=130.36176GB;223.41249;251.33905;0;279.26562 'c: used %'=46%;79;89;0;100
If I try (cluster drive presented from SAN)

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize -a drive=o:
Success!

Code: Select all

OK All 1 drive(s) are ok|'o: used'=32.76171MB;791.99687;890.99648;0;989.99609 'o: used %'=3%;79;89;0;100
I'm trying to avoid defining all drives by name, as that changes across clusters. Our other check_drivesize definitions check for all local drives with this command, but it also returns the Invalid packet type when run against a cluster.

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize -a "drive regexp '.*[C-Z].*'" 'warn=free lt 20%' 'crit=free lt 10%'
Any ideas?
Thanks!
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: NRPE: Check drive size in active/passive cluster

Post by jdalrymple »

Run the check passively and configure nscp as a generic cluster application?
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Check drive size in active/passive cluster

Post by mhixson2 »

jdalrymple wrote:Run the check passively and configure nscp as a generic cluster application?
Ok, good suggestion. Using NSCA checks i assume? I'm looking at the doc below and I'll have to get our networking guy to open 5667 before I can test.
Though, if I'm monitoring the cluster and its nodes with NSClient, won't I end up with it trying to place duplicate nscp services on the active node? I need to be able to monitor local node resources (system drive, cpu, memory) and cluster resources (SAN drives, services) as they're passed back and forth.

https://library.nagios.com/library/prod ... ive-checks

Thanks!
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: NRPE: Check drive size in active/passive cluster

Post by jdalrymple »

mhixson2 wrote:Using NSCA checks i assume? I'm looking at the doc below and I'll have to get our networking guy to open 5667 before I can test.
That is how I'd do it - it's pretty straightforward.
mhixson2 wrote:Though, if I'm monitoring the cluster and its nodes with NSClient, won't I end up with it trying to place duplicate nscp services on the active node? I need to be able to monitor local node resources (system drive, cpu, memory) and cluster resources (SAN drives, services) as they're passed back and forth.
I think I'd probably have 3 hosts, node1, node2 and cluster. It's not ideal, but it's logically how it makes the most sense. Some resources are shared and some aren't and without some pretty serious reconfigure script-fu there is no way for Nagios to know which node has the shared resources at any given time.
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Check drive size in active/passive cluster

Post by mhixson2 »

Ok, that makes sense to me too. That should work.

I found out that the reason the commands are not working for this cluster is because the SAN drives are presented to the OS as mountpoints, which is a departure from how they're presented to any other cluster in our environment. Other clusters work just fine with my regexp check above. So the solution here is to separate these mountpoint clusters (there are only a few) into their own hostgroup and just define the mountpoint SAN drives individually (drive=H: drive=O:, etc) since that seems to work without issue. Not too bad of a workaround.

I'm good on this topic unless you guys have anything to add as to why the wildcard or regexp checks don't work in this situation.

Thanks!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: NRPE: Check drive size in active/passive cluster

Post by lmiltchev »

I'm good on this topic unless you guys have anything to add as to why the wildcard or regexp checks don't work in this situation.
You may try something like this:

Code: Select all

$ ./check_nrpe -H [cluster hostname] -t 30 -c check_drivesize -a "filter=name regexp '.*[C-Z].*'" 'warn=free lt 20%' 'crit=free lt 10%'
I am not sure if this is going to work but I found a post by the developer of NSClient++ that shows a similar syntax:

http://forums.nsclient.org/t/check-disk ... ves/3655/3

If this is not working, you should probably post your issue on the NSClient++ support forum. I will go ahead and mark this topic as "resolved". Thanks!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked