Page 1 of 1
Treating Unknown as Critical
Posted: Tue Aug 26, 2014 3:39 am
by cylindric
We have a bunch of servers that are connected to an iSCSI SAN, and we monitor them for drive-space with check_nrpe:
Code: Select all
$USER1$/check_nrpe -H $HOSTADDRESS$ -c CheckDriveSize -a MinWarnFree=10% MinCritFree=5% Drive=E
What has happened now is that we had a SAN issue that disconnected that volume, bringing down a bunch of our SQL resources. Nagios ignored it, as CheckDriveSize seems to report these as 'UNKNOWN'. Can this be configured to be treated as critical?
Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 10:53 am
by eloyd
What is the configuration line in the client's nrpe.cfg file that corresponds to the command of CheckDriveSize?
Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 11:12 am
by cylindric
There isn't one.
Code: Select all
[/modules]
CheckDisk = 1
CheckEventLog = 1
CheckExternalScripts = 1
CheckHelpers = 1
CheckNSCP = 1
CheckSystem = 1
CheckWMI = 1
NRPEServer = 1
NSClientServer = 1
[/settings/default]
allowed hosts = 10.10.0.6
[/settings/NRPE/server]
allow arguments = true
allow_nasty_meta_chars = 1
Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 11:22 am
by eloyd
Somewhere on the client there needs to be a configuration file that tells NRPE what UNIX command(s) to execute when it is told to execute NRPE command "CheckDriveSize." Unfortunately, I do not recognize the format of the file you listed below, so I don't know where that is. Find it and try to find the line that talks about CheckDriveSize.

Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 6:39 pm
by cylindric
The command works fine for any parameters that don't consist of a string, so I know the command is working in general. That's a standard NSClient++ config file.
Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 6:52 pm
by eloyd
That's a standard NSClient++ config file.
True, but you never said you were using NSclient++.

It makes more sense now.
Okay. It's been a long time since I've played with NSclient++, but I'm thinking that you need the : at the end of your drive specification. Try:
Code: Select all
$USER1$/check_nrpe -H $HOSTADDRESS$ -c CheckDriveSize -a MinWarnFree=10% MinCritFree=5% Drive=E:
(note the colon).
A typical Nagios config for this is:
Code: Select all
define command {
command_name check_driveSize
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c CheckDriveSize -a ShowAll MinWarnFree=$ARG1$ MinCritFree=$ARG2$ Drive=$ARG3$
}
Note the "ShowAll." Then you pass in 10%!5!e: as your parameters. So two suggestions:
- Try adding the colon
- Try adding the ShowAll
Re: Treating Unknown as Critical
Posted: Tue Aug 26, 2014 10:23 pm
by millisa
The scenario doesn't sound wrong to me - you can't check drive size on a drive if the drive is not known. So, the check *should* come back as unknown. If you want notifications for a service in an unknown state, you need to set it with 'notification_options' for the service definition. It wouldn't surprise me if you have your notification options set to just 'w,c,r' for warning, critical recovery. You probably want w,c,r,u to get unknown alerts too. Quick excerpt on 'notification_options' from the service definition section of
http://nagios.sourceforge.net/docs/3_0/ ... ml#service (emphasis mine) :
This directive is used to determine when notifications for the service should be sent out. Valid options are a combination of one or more of the following: w = send notifications on a WARNING state, u = send notifications on an UNKNOWN state, c = send notifications on a CRITICAL state, r = send notifications on recoveries (OK state), f = send notifications when the service starts and stops flapping, and s = send notifications when scheduled downtime starts and ends. If you specify n (none) as an option, no service notifications will be sent out. If you do not specify any notification options, Nagios will assume that you want notifications to be sent out for all possible states. Example: If you specify w,r in this field, notifications will only be sent out when the service goes into a WARNING state and when it recovers from a WARNING state.
Code: Select all
define service {
use sometemplate_services
host_name someservername
check_command check_nrpe_yourcustomcommandname!awesomeargument1
service_description NRPE-disk-withcheese
notification_options w,c,r,u #note, this would override a notification_options defined in the sometemplate_services defined in the 'use' line above
}
Re: Treating Unknown as Critical
Posted: Wed Aug 27, 2014 3:32 am
by cylindric
Thanks, I've enabled "U" notifications now too. I also managed to link the failure state with an eventlog entry, so I now monitor for those too.
Re: Treating Unknown as Critical
Posted: Wed Aug 27, 2014 9:10 am
by tmcdonald
cylindric, are we good to close the thread?
Re: Treating Unknown as Critical
Posted: Wed Aug 27, 2014 9:23 am
by cylindric
Sure.