Monitoring Disk Space by Size, rather than by percentage

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

Good afternoon - one last one before Christmas..!

I want to be able to monitor disk space by available size, rather than available percentage.

I have Googled and found the following solution at http://serverfault.com/questions/309913 ... heck-nt-co

As suggested, I used VIM to create /usr/local/nagios/libexec/check_disk_by_size.sh which looks like this (the third version of the script)

Code: Select all

#!/bin/bash
# Date: 2015-06-30
# Purpose: A wrapper script for check_nt to set threshold for exact space
#          free rather than just percentage.  Useful on VERY large drives.
# Example: check_disk_by_size.sh -H 192.168.0.1 -l c -w 10240 -c 5120

usage() { echo "$0 -H host [-s password] [-p port] [-w warning] [-c critical] [-l params]" 1>&2; exit 1; }

while getopts ":H:s:p:l:w:c:" opt; do
    case "${opt}" in
        H ) HOST_NAME=$OPTARG;;
        s ) PASSW0RD=$OPTARG;;
        p ) PORT=$OPTARG;;
        l ) DISC=$OPTARG;;
        w ) WARN_THRESHOLD=$OPTARG;;
        c ) CRITICAL_THRESHOLD=$OPTARG;;
        \?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
        : ) echo "Option -$OPTARG requires an argument." >&2; exit 1;;
    esac
done
shift $((OPTIND-1))

if [[ -z "${HOST_NAME}" ]] || [[ -z "${PASSW0RD}" ]] || [[ -z "${PORT}" ]] || [[ -z "${DISC}" ]] || [[ -z "${WARN_THRESHOLD}" ]] || [[ -z "${CRITICAL_THRESHOLD}" ]] ; then
    usage
fi

CHECKRESULT=`/usr/local/nagios/libexec/check_nt -H ${HOST_NAME} -p ${PORT} -s ${PASSW0RD} -v USEDDISKSPACE -l ${DISC}`
PERFDATA=`echo ${CHECKRESULT} | awk -F"- " '{ print $4 }' | awk -F "|" '{ print $2 }'`
FREESPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $4 }' | awk -F "|" '{ print $1 }'`
USEDSPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $3 }'`
TOTALSPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $2 }'`

if [[ -z ${FREESPACE} ]]; then
    ## Command failed or server offline
    echo "ERROR ERROR: Command failed"
    exit 1
fi

SIZE=`echo $FREESPACE | awk '{ print $2 }'`
UNIT=`echo $FREESPACE | awk '{ print $3 }'`

if [[ ${UNIT} == "Gb" ]]; then
    SIZE=`echo ${SIZE} \* 1024 | bc`
fi

if [[ `echo "${SIZE} >= ${WARN_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ OK - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 0
elif [[ `echo "${SIZE} < ${WARN_THRESHOLD}" | bc` -eq 1 && `echo "${SIZE} > ${CRITICAL_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ WARNING - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 1
elif [[ `echo "${SIZE} <= ${CRITICAL_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ CRITICAL - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 2
fi

I then did chmod +x check_disk_by_size.sh (which I believe I need to do to make it executable, but I may be wrong)

The syntax to run a manual check is stated as (for example):

Code: Select all

./check_disk_by_size.sh -s "NyNagiosPassword" -H 192.168.999.999 -l c -w 14240 -c 5120
but when I try this with one of my servers, I get:

Code: Select all

./check_disk_by_size.sh -H host [-s password] [-p port] [-w warning] [-c critical] [-l params]
Indicating that my syntax is wrong somewhere - can anyone suggest where I'm going wrong, please - or should I just download the approved plugin from https://exchange.nagios.org/directory/P ... ce/details ?

Cheers and thanks in advance!
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

It appears that the plugin I referred to is only for monitoring Linux servers, so I'm back to trying to resolve my original question.

Anyway, off until the New Year now - so to everyone reading, and especially to those who've helped me througout the year - Merry Christmas!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by rkennedy »

Take a look at this document written by @Box293 which explains how to do so using NSClient++ - http://sites.box293.com/nagios/guides/c ... disk-usage
Former Nagios Employee
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

Good afternoon, and Happy New Year to you all!

As suggested, from the Nagios Master, I tried:

Code: Select all

./check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
which returned:

Code: Select all

connect to address 99.99.99.99 port 5666: Connection refused
Any suggestions?

Thanks in advance

Pete
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by rkennedy »

You'll need to make sure you have NRPE turned on in NSClient++. Look for something like this in your configuration file, and if it doesn't exist, add it -

Code: Select all

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1
You will then need a section for NRPE (if it doesn't exist) -

Code: Select all

; Undocumented section
[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
Then restart the NSClient++ service (nscp).
Former Nagios Employee
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

Good afternoon - thank you for your reply!
I now have:

Code: Select all

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1

; Undocumented section
[/modules]

[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
in my NSCLIENT++.ini

However, having restarted the NSClient++ (x64) service on the target server,

Code: Select all

 ./check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
still returns:

Code: Select all

connect to address 99.99.99.99 port 5666: Connection refused
I assume that this solution means that I'm going to have to modify the .ini file and restart the service on each of my machines?

Finally, and probably a stupid question, do I have to install NRPE on the remote machine too (like I do with my remote Linux boxes), or just the NAGIOS Master?

Thanks - end of the day now, so off home - apologies if you reply and don't get a response until tomorrow afternoon.

Pete
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by rkennedy »

NRPE gets installed with NSClient++ on the client end.

What is the output of netstat -an | findstr 5666 on the Windows machine you're attempting to monitor?

Please also post the full nsclient.ini for us to review - additionally, what version of NSClient++ are you using?
Former Nagios Employee
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

Code: Select all

C:\Program Files\NSClient++>netstat -an | findstr 5666
  TCP    88.88.88.88:12489      99.99.99.99:56664     TIME_WAIT
Where 88.88.88.88 is the IP of the target machine and 99.99.99.99 is the IP of the NAGIOS Master

Here's the NSClient.ini

Code: Select all

# If you want to fill this file with all avalible options run the following command:
#   nscp settings --generate --add-defaults --load-all
# If you want to activate a module and bring in all its options use:
#   nscp settings --activate-module <MODULE NAME> --add-defaults
# For details run: nscp settings --help


; Undocumented section
[/settings/default]

; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts = 99.99.99.99


; Undocumented section
[/modules]

; Undocumented key
Scheduler = 0

; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem = 1

; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests.
NSClientServer = 1

; CheckExternalScripts - Execute external scripts
CheckExternalScripts = 1

; CheckHelpers - Various helper function to extend other checks.
CheckHelpers = 1

; CheckEventLog - Check for errors and warnings in the event log.
CheckEventLog = 1

; CheckNSCP - Use this module to check the healt and status of NSClient++ it self
CheckNSCP = 1

; CheckDisk - CheckDisk can check various file and disk related things.
CheckDisk = 1


; A list of templates for wrapped scripts.
[/settings/external scripts/wrappings]

; WRAPPING - An external script wrapping
ps1 = cmd /c echo scripts\\%SCRIPT% %ARGS%; exit($lastexitcode) | powershell.exe -command -

; WRAPPING - An external script wrapping
bat = scripts\\%SCRIPT% %ARGS%

; WRAPPING - An external script wrapping
An alias is an internal command that has been predefined to provide a single command without arguments. Be careful so you don't create loops (ie check_loop = check_a, check_a=check_loop)

; WRAPPING - An external script wrapping
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%


[/settings/external scripts/alias]

; ALIAS - Query alias
alias_volumes_loose = check_drivesize

; ALIAS - Query alias
alias_volumes = check_drivesize

; ALIAS - Query alias
alias_sched_all = check_tasksched show-all "syntax=${title}: ${exit_code}" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_process_stopped = check_process "process=$ARG1$" "crit=state != 'stopped'"

; ALIAS - Query alias
alias_service = check_service

; ALIAS - Query alias
alias_process_hung = check_process "filter=is_hung" "crit=count>0"

; ALIAS - Query alias
alias_process_count = check_process "process=$ARG1$" "warn=count > $ARG2$" "crit=count > $ARG3$"

; ALIAS - Query alias
alias_process = check_process "process=$ARG1$" "crit=state != 'started'"

; ALIAS - Query alias
alias_mem = check_memory

; ALIAS - Query alias
alias_file_size = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10

; ALIAS - Query alias
alias_event_log = check_eventlog

; ALIAS - Query alias
alias_service_ex = check_service "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc

; ALIAS - Query alias
alias_disk = check_drivesize

; ALIAS - Query alias
alias_file_age = check_files "path=$ARG1$" "crit=written > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${written}" max-dir-depth=10

; ALIAS - Query alias
alias_cpu_ex = check_cpu "warn=load > $ARG1$" "crit=load > $ARG2$" time=5m time=1m time=30s

; ALIAS - Query alias
alias_cpu = check_cpu

; ALIAS - Query alias
alias_up = check_uptime

; ALIAS - Query alias
alias_disk_loose = check_drivesize

; ALIAS - Query alias
alias_sched_task = check_tasksched show-all "filter=title eq '$ARG1$'" "detail-syntax=${title} (${exit_code})" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_sched_long = check_tasksched "filter=status = 'running'" "detail-syntax=${title} (${most_recent_run_time})" "crit=most_recent_run_time < -$ARG1$"

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1

; Undocumented section
[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
The NSClient++ version is:0.4.3.143-x64

Thanks

Pete
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by rkennedy »

The important part is placement, use this NSClient++ configuration file posted below. I just had to move the NRPE server part to be under [modules]. -

Code: Select all

# If you want to fill this file with all avalible options run the following command:
#   nscp settings --generate --add-defaults --load-all
# If you want to activate a module and bring in all its options use:
#   nscp settings --activate-module <MODULE NAME> --add-defaults
# For details run: nscp settings --help


; Undocumented section
[/settings/default]

; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts = 99.99.99.99


; Undocumented section
[/modules]

; Undocumented key
Scheduler = 0

; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem = 1

; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests.
NSClientServer = 1

; CheckExternalScripts - Execute external scripts
CheckExternalScripts = 1

; CheckHelpers - Various helper function to extend other checks.
CheckHelpers = 1

; CheckEventLog - Check for errors and warnings in the event log.
CheckEventLog = 1

; CheckNSCP - Use this module to check the healt and status of NSClient++ it self
CheckNSCP = 1

; CheckDisk - CheckDisk can check various file and disk related things.
CheckDisk = 1

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1


; A list of templates for wrapped scripts.
[/settings/external scripts/wrappings]

; WRAPPING - An external script wrapping
ps1 = cmd /c echo scripts\\%SCRIPT% %ARGS%; exit($lastexitcode) | powershell.exe -command -

; WRAPPING - An external script wrapping
bat = scripts\\%SCRIPT% %ARGS%

; WRAPPING - An external script wrapping
An alias is an internal command that has been predefined to provide a single command without arguments. Be careful so you don't create loops (ie check_loop = check_a, 

check_a=check_loop)

; WRAPPING - An external script wrapping
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%


[/settings/external scripts/alias]

; ALIAS - Query alias
alias_volumes_loose = check_drivesize

; ALIAS - Query alias
alias_volumes = check_drivesize

; ALIAS - Query alias
alias_sched_all = check_tasksched show-all "syntax=${title}: ${exit_code}" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_process_stopped = check_process "process=$ARG1$" "crit=state != 'stopped'"

; ALIAS - Query alias
alias_service = check_service

; ALIAS - Query alias
alias_process_hung = check_process "filter=is_hung" "crit=count>0"

; ALIAS - Query alias
alias_process_count = check_process "process=$ARG1$" "warn=count > $ARG2$" "crit=count > $ARG3$"

; ALIAS - Query alias
alias_process = check_process "process=$ARG1$" "crit=state != 'started'"

; ALIAS - Query alias
alias_mem = check_memory

; ALIAS - Query alias
alias_file_size = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10

; ALIAS - Query alias
alias_event_log = check_eventlog

; ALIAS - Query alias
alias_service_ex = check_service "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc

; ALIAS - Query alias
alias_disk = check_drivesize

; ALIAS - Query alias
alias_file_age = check_files "path=$ARG1$" "crit=written > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${written}" max-dir-depth=10

; ALIAS - Query alias
alias_cpu_ex = check_cpu "warn=load > $ARG1$" "crit=load > $ARG2$" time=5m time=1m time=30s

; ALIAS - Query alias
alias_cpu = check_cpu

; ALIAS - Query alias
alias_up = check_uptime

; ALIAS - Query alias
alias_disk_loose = check_drivesize

; ALIAS - Query alias
alias_sched_task = check_tasksched show-all "filter=title eq '$ARG1$'" "detail-syntax=${title} (${exit_code})" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_sched_long = check_tasksched "filter=status = 'running'" "detail-syntax=${title} (${most_recent_run_time})" "crit=most_recent_run_time < -$ARG1$"


; Undocumented section
[/settings/NRPE/server]
ssl options =
verify mode = none
insecure = true
allow arguments = true
Former Nagios Employee
neworderfac33
Posts: 329
Joined: Fri Jul 24, 2015 11:04 am

Re: Monitoring Disk Space by Size, rather than by percentage

Post by neworderfac33 »

That works just fine, thank you!

Well, it does at least when I issue the command:

Code: Select all

./check_nrpe -H 99.99.99.99 -t 30 -c CheckDriveSize -a ShowAll MinWarn=10G MinCrit=5G Drive=C: perf-unit=G
which returns:

Code: Select all

OK C:: Total: 39.656GB - Used: 16.175GB (41%) - Free: 23.481GB (59%)|'C: free'=23.48132G;10;5;0;39.65624 'C: free %'=59%;25;12;0;100
However, I'm running NSClient 0.4.3.143-x64, which, according to the documentation should allow me to type:

Code: Select all

check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
but when I do, it returns:

Code: Select all

Exception processing request: Request command contained illegal metachars!
It's not THAT much of an issue as the earlier command gives me what I want - just wondering if you can see anything obvious?

Also, any chance you could point me in the direction of the syntax for using nrpe within my services.cfg file, please? I currently have:

Code: Select all

define service {
       use                     generic-service,nagiosgraph
       host_name               MyServer
       #hostgroup_name          001-jenkins-live-masters
       service_description     Drive Space - C -Amount
       check_command           check_nrpe!CheckDriveSize! "w10 -c5"
       }
which I THOUGHT would give a warning when 10GB remained and a critical when 5GB remained - however, the Nagios web interface reports:

Code: Select all

WARNING : Total: 349.996MB - Used: 290.445MB (83%) - Free: 59.551MB (17%) 
This is puzzling enough, but a similar command:

Code: Select all

define service{
       use                      generic-service,nagiosgraph
       host_name                MN2WME12099U
       #hostgroup_name           001-jenkins-live-masters
       service_description      Drive Space - C - check_nt
       check_command            check_nt!USEDDISKSPACE!-l c -w 90 -c 95
       }
returns:

Code: Select all

c: - total: 39.66 Gb - used: 16.17 Gb (41%) - free 23.48 Gb (59%) 
I know that the values returned by the check_nt command are correct, so why are the values reported by check_nrpe not?

Thank you! :-)

Pete

P.S. Totally out of my comfort zone for a 54 year old Brit, but you guys rock! :-)
Locked