Disk checks question

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Disk checks question

Post by BanditBBS »

Let me try and explain my setup and then desired outcome.

Current Setup:
60+ AIX servers configured and being monitored in XI.
Each server has a standard set of drives being monitored(each using check_disk with -p parameter used) plus additional drive(paths) that are different per server. We do this as each drive needs different W and C criteria.

The above setup is working great. The desire I have now, is to add a catch-all check so if a new drive appears, that I am alerted so I can add it to Xi for monitoring. An example check I came up with is:

Code: Select all

./check_nrpe -H 10.97.235.15 -t 30 -c check_disk -a '-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /notes -x /u01 -x /proc -x /opt -x /tomaxbin -x / -i /var* -i /notes*' -n
That excludes all drives that I don't care about or that I am already monitoring. The issue with that is since no drives are being returned to the check, it responds to Nagios with:

Code: Select all

DISK UNKNOWN - free space:|
I'd love to figure out how to get it to return with an OK status. Anyone have any ideas? :?
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Disk checks question

Post by scottwilkerson »

Use the negate plugin

Code: Select all

http://library.nagios.com/library/products/nagiosxi/documentation/633-using-the-negate-plugin
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Disk checks question

Post by abrist »

As check_disk is a bin, our options are limited. The negate plugin is your best bet. If you want to roll your own solution:

You may be able to create a wrapper script that passes the arguments to check_disk, capturing the return output and changing unknown to OK (no new disks) and OK to critical (new disks detected) and unchanging critical (new disk detected, over threshold). The plugin would then return the check_disk string with the status info and exit code changed. This method will have all sorts of 'fun' escaping issues and the like (passing the arguments through the wrapper script to check_disk may have problematic syntax), but most likely will work.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks question

Post by BanditBBS »

abrist wrote:As check_disk is a bin, our options are limited. The negate plugin is your best bet. If you want to roll your own solution:

You may be able to create a wrapper script that passes the arguments to check_disk, capturing the return output and changing unknown to OK (no new disks) and OK to critical (new disks detected) and unchanging critical (new disk detected, over threshold). The plugin would then return the check_disk string with the status info and exit code changed. This method will have all sorts of 'fun' escaping issues and the like (passing the arguments through the wrapper script to check_disk may have problematic syntax), but most likely will work.
Thats a good idea, I like the idea of being able to change the text, so will write a wrapper.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Disk checks question

Post by scottwilkerson »

The negate plugin can also change unknowns to OK.. Same result 2 ways to do it...
# /usr/local/nagios/libexec/negate -h
negate v1989 (nagios-plugins 1.4.13)
Copyright (c) 2002-2008 Nagios Plugin Development Team
<[email protected]>

Negates the status of a plugin (returns OK for CRITICAL and vice-versa).
Additional switches can be used to control which state becomes what.


Usage:negate [-t timeout] [-owcu STATE] [-s] <definition of wrapped plugin>

Options:
-h, --help
Print detailed help screen
-V, --version
Print version information
-t, --timeout=INTEGER
Seconds before connection times out (default: 11)
Keep timeout longer than the plugin timeout to retain CRITICAL status.
-o, --ok=STATUS
-w, --warning=STATUS
-c, --critical=STATUS
-u, --unknown=STATUS
STATUS can be 'OK', 'WARNING', 'CRITICAL' or 'UNKNOWN' without single
quotes. Numeric values are accepted. If nothing is specified, permutes
OK and CRITICAL.
-s, --substitute
Substitute output text as well. Will only substitute text in CAPITALS

Examples:
negate /usr/local/nagios/libexec/check_ping -H host
Run check_ping and invert result. Must use full path to plugin
negate -w OK -c UNKNOWN /usr/local/nagios/libexec/check_procs -a 'vi negate.c'
This will return OK instead of WARNING and UNKNOWN instead of CRITICAL

Notes:
This plugin is a wrapper to take the output of another plugin and invert it.
The full path of the plugin must be provided.
If the wrapped plugin returns OK, the wrapper will return CRITICAL.
If the wrapped plugin returns CRITICAL, the wrapper will return OK.
Otherwise, the output state of the wrapped plugin is unchanged.

Send email to [email protected] if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to [email protected]
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks question

Post by BanditBBS »

Ok, let me see if anyone here can help me, as I am stumped with my script :)

This command works fine:

Code: Select all

./check_nrpe -n -H 10.97.235.15 -t 30 -c check_disk -a '-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /u01 -x /proc -x /opt -x /tomaxbin -i '/var*$' -i '^/notes*$''
I get the result back:

Code: Select all

DISK UNKNOWN - free space:|
I am trying to right the script to return OK if it gets that back and Critical if anything else.
Here is what I wrote:

Code: Select all

#!/bin/bash
if [[ $(/usr/local/nagios/libexec/check_nrpe -n -H $1 -t 30 -c check_disk -a $2) == "DISK UNKNOWN - free space:|" ]]
then
        echo "OK: No new drives!";
        exit 0;
else
        echo "CRITICAL: New drives!";
        exit 2;
fi;
That does not work, if I enter this on the command line:

Code: Select all

./check_new_disk 10.97.235.15 "'-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /u01 -x /proc -x /opt -x /tomaxbin -i '^/var*$' -i '^/notes*$''"
I've played around the quotes either around the $2 in the script or on the command line itself and no matter how I do it, it just won't work. I'm sure I have to escape something or something else that is simple, but I just can't figure it out.

I know this isn't a script writing forum, but thought I'd ask, just in case it's that simple :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Disk checks question

Post by scottwilkerson »

How about something like this

Code: Select all

#!/bin/bash

if [[ $("$@") == "DISK UNKNOWN - free space:|" ]]
then
        echo "OK: No new drives!";
        exit 0;
else
        echo "CRITICAL: New drives!";
        exit 2;
fi;
and call with

Code: Select all

/usr/local/nagios/libexec/check_new_disk /usr/local/nagios/libexec/check_nrpe -n -H 10.97.235.15 -t 30 -c check_disk -a "-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /u01 -x /proc -x /opt -x /tomaxbin -i '/var*$' -i '^/notes*$'"
Translates to a command

Code: Select all

$USER1$/check_new_disk $USER1$/check_nrpe -n -H $HOSTADDRESS$ -t 30 -c check_disk -a "-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /u01 -x /proc -x /opt -x /tomaxbin -i '/var*$' -i '^/notes*$'"
Last edited by scottwilkerson on Mon Aug 05, 2013 11:41 am, edited 1 time in total.
Reason: modified to change to double quotes around -a
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Disk checks question

Post by BanditBBS »

Scott,

Not quite, but VERY close, which helped me resolve my issue. So I owe you one!

Just in case anyone else ever needs to do something silly like this, I'll explain this.

Our AIX(this could be used on any *nix) servers all have different drives and different requirements per drive. So every mount point we want monitored is a separate check as the W and C values are different. We also wanted a mechanism to alert the Nagios admins(me) if a new mount point was ever added to a server so I could add a check for it. If you run the check_disk check and exclude and/or ignore everything, then your get an unknown reply. I wrote this to change the unknown into an OK and anything else(meaning a new drive was found) into a critical. The only changes to Scott's above message was to the Nagios command. I changed the double quotes(didnt work) into single quotes and everything between them I changed to $ARG1$

So, used the same script Scott gave me but changed what called it to:

Code: Select all

/usr/local/nagios/libexec/check_new_disk /usr/local/nagios/libexec/check_nrpe -n -H 10.97.235.15 -t 30 -c check_disk -a '-w 1000 -c 500 -A -x / -x /usr -x /home -x /tmp -x /u01 -x /proc -x /opt -x /tomaxbin -i '/var*$' -i '^/notes*$''
and changed the command definition in Nagios to:

Code: Select all

$USER1$/check_new_disk $USER1$/check_nrpe -n -H $HOSTADDRESS$ -t 30 -c check_disk -a '$ARG1$'
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Disk checks question

Post by scottwilkerson »

That makes total sense as what I had would have stripped out the $, whereas what you have should use the literal...

Anyways, glad you resolved it...
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked