Page 1 of 1

check_disk disable critical

Posted: Fri May 13, 2016 6:55 am
by chrisgoforth@rcn.com
I am new to configuring Nagios alerting and need some help on what is probably a simple fix. I have file system monitoring setup and working fine in my environment but I need to modify it for our DEV/QA areas. Currently we get a warning level alert at 90% full and critical at 95%. I need to get rid of critical all together. We want to know when a FS gets above a certain threshold with a warning alert but even if it reaches 100% it is still not considered a "critical" event in this area. Any suggestions?

# 'DEV File System' command definition
define command{
command_name DEV_check_local_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_disk -a $ARG1$ $ARG2$ $ARG3$
}

define service{
use dev_default-service
service_description DEV FS: /
check_command DEV_check_local_disk!10%!5%!/
contact_groups TEST-CG
servicegroups DEV_FS_Root-SG
}

Re: check_disk disable critical

Posted: Fri May 13, 2016 12:54 pm
by rkennedy
It sounds like you would need to modify the plugin directly, and see if you can have it just exit on a WARNING instead of CRITICAL.

You could use your negate plugin, to switch CRIT -> WARN - take a look at this document for more information https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Re: check_disk disable critical

Posted: Fri May 13, 2016 12:56 pm
by tgriep
Another option is to add a notification option to that service and only send them out for warnings and recoveries like the example below.

Code: Select all

notification_options		w,r,

Re: check_disk disable critical

Posted: Fri May 13, 2016 1:13 pm
by mcapra
You could try leaving the critical threshold undefined like so:

Code: Select all

define service{
        use                  dev_default-service
        service_description  DEV FS: /
        check_command        DEV_check_local_disk!10%!!/
        contact_groups       TEST-CG
        servicegroups        DEV_FS_Root-SG
        }
This works for me in core for check_local_disk. Your results may vary depending on your DEV_check_local_disk behavior.

Re: check_disk disable critical

Posted: Thu May 19, 2016 9:47 am
by chrisgoforth@rcn.com
I have been playing around with this and no luck so far. I tried the "check_command DEV_check_local_disk!10%!!/" recommendation and it just threw and error saying I had to define a criteria for critical

I even tried giving it a negative value to try and fool it and it really disliked that.

As for the notification suggestion the problem there is if it goes straight to critical no one will get notified about it which would be a problem.

Re: check_disk disable critical

Posted: Thu May 19, 2016 3:48 pm
by tgriep
Take a look at the Exchange site to see if there are other disk checks that have optional thresholds.
https://exchange.nagios.org/

Re: check_disk disable critical

Posted: Thu May 19, 2016 4:51 pm
by mcapra
I notice the command definition for DEV_check_local_disk is asking for 3 args in a non-specific way

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_disk -a $ARG1$ $ARG2$ $ARG3$

You could try explicitly declaring the arguments you are sending to the remote machine's check_disk script like so:

Code: Select all

# 'DEV File System' command definition
define command{
    command_name    DEV_check_local_disk
    command_line       $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_disk -a '-w $ARG1$ -p $ARG2$'
    }
Notice how I am telling the check_disk command on the remote machine exactly what each argument is used for. I am saying $ARG1$ is used for the "warning" threshold and $ARG2$ is being used for the "path".

Your service definition should change accordingly:

Code: Select all

define service{
        use                  dev_default-service
        service_description  DEV FS: /
        check_command        DEV_check_local_disk!10%!/
        contact_groups       TEST-CG
        servicegroups        DEV_FS_Root-SG
        }
The check_command field for a service definition feeds in arguments in a sequential fashion following a ! character. Eg, DEV_check_local_disk!$ARG1$!$ARG2!$ARG3 ... !$ARGN


chrisgoforth@rcn.com wrote:I had to define a criteria for critical
I believe this is because your current command definition is trying to send the remote machine's check_disk 3 arguments (in a non specific way) when all you really need is 2 arguments (sent in a specific way): The warning threshold and the path

Re: check_disk disable critical

Posted: Fri May 20, 2016 9:05 am
by chrisgoforth@rcn.com
Through the help and guidance of everyone on here I finally found the solution. Probably not the only solution but we have verified it works. It was a little rough figuring out that i needed a space before the -w otherwise it was trying to check /local-w. This is now working flawlessly. Thank you everyone for all of your help.

define service{
use dev_default-service
service_description DEV FS: /local
check_command DEV_check_local_disk!'-p /local'' -w 10%'
contact_groups TEST-CG
servicegroups DEV_FS_Local-SG
}


# 'DEV File System' command definition
define command{
command_name DEV_check_local_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_disk -a $ARG1$ $ARG2$
}

Re: check_disk disable critical

Posted: Fri May 20, 2016 9:13 am
by mcapra
Glad to hear you were able to find a solution! Can I lock this thread and mark the issue as resolved?

Re: check_disk disable critical

Posted: Fri May 20, 2016 11:15 am
by chrisgoforth@rcn.com
You can. And again thanks to everyone who assisted with this.