Page 1 of 2

Monitoring Disk Space by Size, rather than by percentage

Posted: Thu Dec 22, 2016 10:25 am
by neworderfac33
Good afternoon - one last one before Christmas..!

I want to be able to monitor disk space by available size, rather than available percentage.

I have Googled and found the following solution at http://serverfault.com/questions/309913 ... heck-nt-co

As suggested, I used VIM to create /usr/local/nagios/libexec/check_disk_by_size.sh which looks like this (the third version of the script)

Code: Select all

#!/bin/bash
# Date: 2015-06-30
# Purpose: A wrapper script for check_nt to set threshold for exact space
#          free rather than just percentage.  Useful on VERY large drives.
# Example: check_disk_by_size.sh -H 192.168.0.1 -l c -w 10240 -c 5120

usage() { echo "$0 -H host [-s password] [-p port] [-w warning] [-c critical] [-l params]" 1>&2; exit 1; }

while getopts ":H:s:p:l:w:c:" opt; do
    case "${opt}" in
        H ) HOST_NAME=$OPTARG;;
        s ) PASSW0RD=$OPTARG;;
        p ) PORT=$OPTARG;;
        l ) DISC=$OPTARG;;
        w ) WARN_THRESHOLD=$OPTARG;;
        c ) CRITICAL_THRESHOLD=$OPTARG;;
        \?) echo "Invalid option: -$OPTARG" >&2; exit 1 ;;
        : ) echo "Option -$OPTARG requires an argument." >&2; exit 1;;
    esac
done
shift $((OPTIND-1))

if [[ -z "${HOST_NAME}" ]] || [[ -z "${PASSW0RD}" ]] || [[ -z "${PORT}" ]] || [[ -z "${DISC}" ]] || [[ -z "${WARN_THRESHOLD}" ]] || [[ -z "${CRITICAL_THRESHOLD}" ]] ; then
    usage
fi

CHECKRESULT=`/usr/local/nagios/libexec/check_nt -H ${HOST_NAME} -p ${PORT} -s ${PASSW0RD} -v USEDDISKSPACE -l ${DISC}`
PERFDATA=`echo ${CHECKRESULT} | awk -F"- " '{ print $4 }' | awk -F "|" '{ print $2 }'`
FREESPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $4 }' | awk -F "|" '{ print $1 }'`
USEDSPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $3 }'`
TOTALSPACE=`echo ${CHECKRESULT} | awk -F"- " '{ print $2 }'`

if [[ -z ${FREESPACE} ]]; then
    ## Command failed or server offline
    echo "ERROR ERROR: Command failed"
    exit 1
fi

SIZE=`echo $FREESPACE | awk '{ print $2 }'`
UNIT=`echo $FREESPACE | awk '{ print $3 }'`

if [[ ${UNIT} == "Gb" ]]; then
    SIZE=`echo ${SIZE} \* 1024 | bc`
fi

if [[ `echo "${SIZE} >= ${WARN_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ OK - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 0
elif [[ `echo "${SIZE} < ${WARN_THRESHOLD}" | bc` -eq 1 && `echo "${SIZE} > ${CRITICAL_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ WARNING - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 1
elif [[ `echo "${SIZE} <= ${CRITICAL_THRESHOLD}" | bc` -eq 1 ]]; then
    echo "${DISC}:\ CRITICAL - ${TOTALSPACE} - ${USEDSPACE} - ${FREESPACE} | ${PERFDATA}"
    exit 2
fi

I then did chmod +x check_disk_by_size.sh (which I believe I need to do to make it executable, but I may be wrong)

The syntax to run a manual check is stated as (for example):

Code: Select all

./check_disk_by_size.sh -s "NyNagiosPassword" -H 192.168.999.999 -l c -w 14240 -c 5120
but when I try this with one of my servers, I get:

Code: Select all

./check_disk_by_size.sh -H host [-s password] [-p port] [-w warning] [-c critical] [-l params]
Indicating that my syntax is wrong somewhere - can anyone suggest where I'm going wrong, please - or should I just download the approved plugin from https://exchange.nagios.org/directory/P ... ce/details ?

Cheers and thanks in advance!

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Thu Dec 22, 2016 12:40 pm
by neworderfac33
It appears that the plugin I referred to is only for monitoring Linux servers, so I'm back to trying to resolve my original question.

Anyway, off until the New Year now - so to everyone reading, and especially to those who've helped me througout the year - Merry Christmas!

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Thu Dec 22, 2016 1:41 pm
by rkennedy
Take a look at this document written by @Box293 which explains how to do so using NSClient++ - http://sites.box293.com/nagios/guides/c ... disk-usage

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Tue Jan 03, 2017 9:33 am
by neworderfac33
Good afternoon, and Happy New Year to you all!

As suggested, from the Nagios Master, I tried:

Code: Select all

./check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
which returned:

Code: Select all

connect to address 99.99.99.99 port 5666: Connection refused
Any suggestions?

Thanks in advance

Pete

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Tue Jan 03, 2017 11:42 am
by rkennedy
You'll need to make sure you have NRPE turned on in NSClient++. Look for something like this in your configuration file, and if it doesn't exist, add it -

Code: Select all

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1
You will then need a section for NRPE (if it doesn't exist) -

Code: Select all

; Undocumented section
[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
Then restart the NSClient++ service (nscp).

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Tue Jan 03, 2017 12:41 pm
by neworderfac33
Good afternoon - thank you for your reply!
I now have:

Code: Select all

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1

; Undocumented section
[/modules]

[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
in my NSCLIENT++.ini

However, having restarted the NSClient++ (x64) service on the target server,

Code: Select all

 ./check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
still returns:

Code: Select all

connect to address 99.99.99.99 port 5666: Connection refused
I assume that this solution means that I'm going to have to modify the .ini file and restart the service on each of my machines?

Finally, and probably a stupid question, do I have to install NRPE on the remote machine too (like I do with my remote Linux boxes), or just the NAGIOS Master?

Thanks - end of the day now, so off home - apologies if you reply and don't get a response until tomorrow afternoon.

Pete

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Tue Jan 03, 2017 12:44 pm
by rkennedy
NRPE gets installed with NSClient++ on the client end.

What is the output of netstat -an | findstr 5666 on the Windows machine you're attempting to monitor?

Please also post the full nsclient.ini for us to review - additionally, what version of NSClient++ are you using?

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Wed Jan 04, 2017 4:18 am
by neworderfac33

Code: Select all

C:\Program Files\NSClient++>netstat -an | findstr 5666
  TCP    88.88.88.88:12489      99.99.99.99:56664     TIME_WAIT
Where 88.88.88.88 is the IP of the target machine and 99.99.99.99 is the IP of the NAGIOS Master

Here's the NSClient.ini

Code: Select all

# If you want to fill this file with all avalible options run the following command:
#   nscp settings --generate --add-defaults --load-all
# If you want to activate a module and bring in all its options use:
#   nscp settings --activate-module <MODULE NAME> --add-defaults
# For details run: nscp settings --help


; Undocumented section
[/settings/default]

; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts = 99.99.99.99


; Undocumented section
[/modules]

; Undocumented key
Scheduler = 0

; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem = 1

; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests.
NSClientServer = 1

; CheckExternalScripts - Execute external scripts
CheckExternalScripts = 1

; CheckHelpers - Various helper function to extend other checks.
CheckHelpers = 1

; CheckEventLog - Check for errors and warnings in the event log.
CheckEventLog = 1

; CheckNSCP - Use this module to check the healt and status of NSClient++ it self
CheckNSCP = 1

; CheckDisk - CheckDisk can check various file and disk related things.
CheckDisk = 1


; A list of templates for wrapped scripts.
[/settings/external scripts/wrappings]

; WRAPPING - An external script wrapping
ps1 = cmd /c echo scripts\\%SCRIPT% %ARGS%; exit($lastexitcode) | powershell.exe -command -

; WRAPPING - An external script wrapping
bat = scripts\\%SCRIPT% %ARGS%

; WRAPPING - An external script wrapping
An alias is an internal command that has been predefined to provide a single command without arguments. Be careful so you don't create loops (ie check_loop = check_a, check_a=check_loop)

; WRAPPING - An external script wrapping
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%


[/settings/external scripts/alias]

; ALIAS - Query alias
alias_volumes_loose = check_drivesize

; ALIAS - Query alias
alias_volumes = check_drivesize

; ALIAS - Query alias
alias_sched_all = check_tasksched show-all "syntax=${title}: ${exit_code}" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_process_stopped = check_process "process=$ARG1$" "crit=state != 'stopped'"

; ALIAS - Query alias
alias_service = check_service

; ALIAS - Query alias
alias_process_hung = check_process "filter=is_hung" "crit=count>0"

; ALIAS - Query alias
alias_process_count = check_process "process=$ARG1$" "warn=count > $ARG2$" "crit=count > $ARG3$"

; ALIAS - Query alias
alias_process = check_process "process=$ARG1$" "crit=state != 'started'"

; ALIAS - Query alias
alias_mem = check_memory

; ALIAS - Query alias
alias_file_size = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10

; ALIAS - Query alias
alias_event_log = check_eventlog

; ALIAS - Query alias
alias_service_ex = check_service "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc

; ALIAS - Query alias
alias_disk = check_drivesize

; ALIAS - Query alias
alias_file_age = check_files "path=$ARG1$" "crit=written > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${written}" max-dir-depth=10

; ALIAS - Query alias
alias_cpu_ex = check_cpu "warn=load > $ARG1$" "crit=load > $ARG2$" time=5m time=1m time=30s

; ALIAS - Query alias
alias_cpu = check_cpu

; ALIAS - Query alias
alias_up = check_uptime

; ALIAS - Query alias
alias_disk_loose = check_drivesize

; ALIAS - Query alias
alias_sched_task = check_tasksched show-all "filter=title eq '$ARG1$'" "detail-syntax=${title} (${exit_code})" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_sched_long = check_tasksched "filter=status = 'running'" "detail-syntax=${title} (${most_recent_run_time})" "crit=most_recent_run_time < -$ARG1$"

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1

; Undocumented section
[/settings/NRPE/server]
ssl options = 
verify mode = none
insecure = true
allow arguments = true
The NSClient++ version is:0.4.3.143-x64

Thanks

Pete

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Wed Jan 04, 2017 11:33 am
by rkennedy
The important part is placement, use this NSClient++ configuration file posted below. I just had to move the NRPE server part to be under [modules]. -

Code: Select all

# If you want to fill this file with all avalible options run the following command:
#   nscp settings --generate --add-defaults --load-all
# If you want to activate a module and bring in all its options use:
#   nscp settings --activate-module <MODULE NAME> --add-defaults
# For details run: nscp settings --help


; Undocumented section
[/settings/default]

; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts = 99.99.99.99


; Undocumented section
[/modules]

; Undocumented key
Scheduler = 0

; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem = 1

; NSClientServer - A server that listens for incoming check_nt connection and processes incoming requests.
NSClientServer = 1

; CheckExternalScripts - Execute external scripts
CheckExternalScripts = 1

; CheckHelpers - Various helper function to extend other checks.
CheckHelpers = 1

; CheckEventLog - Check for errors and warnings in the event log.
CheckEventLog = 1

; CheckNSCP - Use this module to check the healt and status of NSClient++ it self
CheckNSCP = 1

; CheckDisk - CheckDisk can check various file and disk related things.
CheckDisk = 1

; NRPEServer - A server that listens for incoming NRPE connection and processes incoming requests.
NRPEServer = 1


; A list of templates for wrapped scripts.
[/settings/external scripts/wrappings]

; WRAPPING - An external script wrapping
ps1 = cmd /c echo scripts\\%SCRIPT% %ARGS%; exit($lastexitcode) | powershell.exe -command -

; WRAPPING - An external script wrapping
bat = scripts\\%SCRIPT% %ARGS%

; WRAPPING - An external script wrapping
An alias is an internal command that has been predefined to provide a single command without arguments. Be careful so you don't create loops (ie check_loop = check_a, 

check_a=check_loop)

; WRAPPING - An external script wrapping
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%


[/settings/external scripts/alias]

; ALIAS - Query alias
alias_volumes_loose = check_drivesize

; ALIAS - Query alias
alias_volumes = check_drivesize

; ALIAS - Query alias
alias_sched_all = check_tasksched show-all "syntax=${title}: ${exit_code}" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_process_stopped = check_process "process=$ARG1$" "crit=state != 'stopped'"

; ALIAS - Query alias
alias_service = check_service

; ALIAS - Query alias
alias_process_hung = check_process "filter=is_hung" "crit=count>0"

; ALIAS - Query alias
alias_process_count = check_process "process=$ARG1$" "warn=count > $ARG2$" "crit=count > $ARG3$"

; ALIAS - Query alias
alias_process = check_process "process=$ARG1$" "crit=state != 'started'"

; ALIAS - Query alias
alias_mem = check_memory

; ALIAS - Query alias
alias_file_size = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10

; ALIAS - Query alias
alias_event_log = check_eventlog

; ALIAS - Query alias
alias_service_ex = check_service "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc

; ALIAS - Query alias
alias_disk = check_drivesize

; ALIAS - Query alias
alias_file_age = check_files "path=$ARG1$" "crit=written > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${written}" max-dir-depth=10

; ALIAS - Query alias
alias_cpu_ex = check_cpu "warn=load > $ARG1$" "crit=load > $ARG2$" time=5m time=1m time=30s

; ALIAS - Query alias
alias_cpu = check_cpu

; ALIAS - Query alias
alias_up = check_uptime

; ALIAS - Query alias
alias_disk_loose = check_drivesize

; ALIAS - Query alias
alias_sched_task = check_tasksched show-all "filter=title eq '$ARG1$'" "detail-syntax=${title} (${exit_code})" "crit=exit_code ne 0"

; ALIAS - Query alias
alias_sched_long = check_tasksched "filter=status = 'running'" "detail-syntax=${title} (${most_recent_run_time})" "crit=most_recent_run_time < -$ARG1$"


; Undocumented section
[/settings/NRPE/server]
ssl options =
verify mode = none
insecure = true
allow arguments = true

Re: Monitoring Disk Space by Size, rather than by percentage

Posted: Thu Jan 05, 2017 9:51 am
by neworderfac33
That works just fine, thank you!

Well, it does at least when I issue the command:

Code: Select all

./check_nrpe -H 99.99.99.99 -t 30 -c CheckDriveSize -a ShowAll MinWarn=10G MinCrit=5G Drive=C: perf-unit=G
which returns:

Code: Select all

OK C:: Total: 39.656GB - Used: 16.175GB (41%) - Free: 23.481GB (59%)|'C: free'=23.48132G;10;5;0;39.65624 'C: free %'=59%;25;12;0;100
However, I'm running NSClient 0.4.3.143-x64, which, according to the documentation should allow me to type:

Code: Select all

check_nrpe -H 99.99.99.99 -t 30 -c check_drivesize -a drive=C: 'warning=free<10G' 'critical=free<5G' show-all 'perf-config=*(unit:G)' detail-syntax='{${drive_or_name} ${free} free / ${size} total}' top-syntax='${status}: ${problem_list}'
but when I do, it returns:

Code: Select all

Exception processing request: Request command contained illegal metachars!
It's not THAT much of an issue as the earlier command gives me what I want - just wondering if you can see anything obvious?

Also, any chance you could point me in the direction of the syntax for using nrpe within my services.cfg file, please? I currently have:

Code: Select all

define service {
       use                     generic-service,nagiosgraph
       host_name               MyServer
       #hostgroup_name          001-jenkins-live-masters
       service_description     Drive Space - C -Amount
       check_command           check_nrpe!CheckDriveSize! "w10 -c5"
       }
which I THOUGHT would give a warning when 10GB remained and a critical when 5GB remained - however, the Nagios web interface reports:

Code: Select all

WARNING : Total: 349.996MB - Used: 290.445MB (83%) - Free: 59.551MB (17%) 
This is puzzling enough, but a similar command:

Code: Select all

define service{
       use                      generic-service,nagiosgraph
       host_name                MN2WME12099U
       #hostgroup_name           001-jenkins-live-masters
       service_description      Drive Space - C - check_nt
       check_command            check_nt!USEDDISKSPACE!-l c -w 90 -c 95
       }
returns:

Code: Select all

c: - total: 39.66 Gb - used: 16.17 Gb (41%) - free 23.48 Gb (59%) 
I know that the values returned by the check_nt command are correct, so why are the values reported by check_nrpe not?

Thank you! :-)

Pete

P.S. Totally out of my comfort zone for a 54 year old Brit, but you guys rock! :-)