Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Ulfandu
Posts: 26 Joined: Thu Jun 14, 2018 4:53 am
Post
by Ulfandu » Mon Aug 24, 2020 7:00 am
Hello,
I've been dabbling with a script that would check the RAM status for both storage controllers on a HPE StoreVirtual 3200 SAN.
The script seems to work so far that I'm able to fetch the amount of total ram, its usage, and calculate the amount of RAM available. However, I'm not understanding why my script is constantly returning CRITICAL.
Here's the script in question:
Code: Select all
#! /bin/sh
#
# MIBS required for the script to work:
# LEFTHAND-NETWORKS-NSM-CLUSTERING-MIB.mib
# LEFTHAND-NETWORKS-GLOBAL-REG-MIB.mib
# LEFTHAND-NETWORKS-NSM-MIB.mib
#
# Usage: ./check_lhc_mem.sh IP
# return states for nagios
STATE_OK=$(expr 0)
STATE_WARNING=$(expr 1)
STATE_CRITICAL=$(expr 2)
STATE_UNKNOWN=$(expr 3)
# check device name
Name=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -o SNMPv2-MIB::sysName.0|cut -d" " -f4)
RET=$?
if [[ $RET -ne 0 ]]
then
echo "query problem - No data received from host"
exit $STATE_UNKNOWN
fi
# read Physical Memory
RamTotal=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -o HOST-RESOURCES-MIB::hrStorageSize.1|cut -d" " -f4)
# read Used Memory
RamUsed=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -o HOST-RESOURCES-MIB::hrStorageUsed.1|cut -d" " -f4)
checkcount=0
iomodulestring=""
# convert KB --> GB
RamTotal=$(echo "$RamTotal / 1024" | bc)
RamUsed=$(echo "$RamUsed / 1024" | bc)
RamAvail=$(echo "$RamTotal - $RamUsed" |bc)
# convert MB --> GB
RamTotal=$(echo "$RamTotal / 1024" | bc)
RamUsed=$(echo "$RamUsed / 1024" | bc)
RamAvail=$(echo "$RamTotal - $RamUsed" |bc)
# calculate the percentage of available RAM
percentfree=$(echo "scale=2; $RamAvail/$RamTotal " |bc)
percentfree=$(echo "$percentfree*100" | bc)
# thresholds for critical and warning
critical=$(echo 10.0)
warning=$(echo 15.0)
# compare
if [[ "$(echo "$percentfree <= $critical" | bc -l)" ]]; then
echo "CRITICAL - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_CRITICAL
elif [[ "$(echo "$percentfree > $warning" | bc -l)" ]]; then
echo "CRITICAL - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_OK
elif [[ "$(echo "$percentfree <= $warning" | bc -l)" ]]; then
echo "CRITICAL - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_WARNING
else
echo "problem - No data received from host"
exit $STATE_UNKNOWN
fi
Here's a debug run of the script on a certain SV3200 controller:
Code: Select all
++ expr 0
+ STATE_OK=0
++ expr 1
+ STATE_WARNING=1
++ expr 2
+ STATE_CRITICAL=2
++ expr 3
+ STATE_UNKNOWN=3
++ /usr/local/nagios/libexec/check_snmp -P 2c -H 10.88.255.22 -C corpsnmp -o SNMPv2-MIB::sysName.0
++ cut '-d ' -f4
+ Name=Company-SV3200-2-SC1
+ RET=0
+ [[ 0 -ne 0 ]]
++ /usr/local/nagios/libexec/check_snmp -P 2c -H 10.88.255.22 -C corpsnmp -o HOST-RESOURCES-MIB::hrStorageSize.1
++ cut '-d ' -f4
+ RamTotal=6649216
++ /usr/local/nagios/libexec/check_snmp -P 2c -H 10.88.255.22 -C corpsnmp -o HOST-RESOURCES-MIB::hrStorageUsed.1
++ cut '-d ' -f4
+ RamUsed=5991744
+ checkcount=0
+ iomodulestring=
++ echo '6649216 / 1024'
++ bc
+ RamTotal=6493
++ echo '5991744 / 1024'
++ bc
+ RamUsed=5851
++ bc
++ echo '6493 - 5851'
+ RamAvail=642
++ echo '6493 / 1024'
++ bc
+ RamTotal=6
++ echo '5851 / 1024'
++ bc
+ RamUsed=5
++ echo '6 - 5'
++ bc
+ RamAvail=1
++ bc
++ echo 'scale=2; 1/6 '
+ percentfree=.16
++ echo '.16*100'
++ bc
+ percentfree=16.00
++ echo 10.0
+ critical=10.0
++ echo 15.0
+ warning=15.0
++ echo '16.00 <= 10.0'
++ bc -l
+ [[ -n 0 ]]
+ echo 'CRITICAL - Company-SV3200-2-SC1 is 16.00 % free | used=5GB;;total=6GB;;available=1GB'
CRITICAL - Company-SV3200-2-SC1 is 16.00 % free | used=5GB;;total=6GB;;available=1GB
+ exit 2
I would greatly appreciate any help on this matter as I have ran into a brick wall
I can provide the MIBs used for the script if necessary.
jdunitz
Posts: 235 Joined: Wed Feb 05, 2020 2:50 pm
Post
by jdunitz » Mon Aug 24, 2020 3:51 pm
There are a few problems I noticed:
One is, you might consider converting your calculations to whole numbers by dividing by 1 and using bc -l:
Code: Select all
percentfree=$(echo "scale=2; $RamAvail/$RamTotal " |bc -l)
percentfree=$(echo "$percentfree*100" | bc -l)
percentfree=$(echo "scale=0; $percentfree / 1"| bc -l)
Then, I rewrote your comparison section at the end to not use bc, but just do numeric comparisons in the shell:
Code: Select all
# compare
if [[ $percentfree -le $critical ]]; then
echo "CRITICAL - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_CRITICAL
elif [[ $percentfree -le $warning ]]; then
echo "WARNING - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_OK
elif [[ $percentfree > $warning ]]; then
echo "OK - "$Name" is "$percentfree" % free | used="$RamUsed"GB;;total="$RamTotal"GB;;available="$RamAvail"GB"
exit $STATE_WARNING
else
echo "problem - No data received from host"
exit $STATE_UNKNOWN
fi
This isn't perfect, but I think it's closer to what you're after.
--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new
Privacy Policy .
Be sure to check out our
Knowledgebase for helpful articles and solutions!
Ulfandu
Posts: 26 Joined: Thu Jun 14, 2018 4:53 am
Post
by Ulfandu » Wed Aug 26, 2020 1:28 am
Thanks! This helped a lot and I managed to figure out how to fix it. For those wondering, here's the full script now:
Code: Select all
#! /bin/sh
#
# Example output:
# WARNING - corp-SV3200-1-SC2 is 15 % free | used=5508MB;;total=6493MB;;available=985MB
#
#
# Usage: ./check_lhc_mem.sh IP
# return states for nagios
STATE_OK=$(expr 0)
STATE_WARNING=$(expr 1)
STATE_CRITICAL=$(expr 2)
STATE_UNKNOWN=$(expr 3)
# check device name
Name=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -o SNMPv2-MIB::sysName.0|cut -d" " -f4)
RET=$?
if [[ $RET -ne 0 ]]
then
echo "query problem - No data received from host"
exit $STATE_UNKNOWN
fi
# read Physical Memory
RamTotal=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -t 60 -o HOST-RESOURCES-MIB::hrStorageSize.1|cut -d" " -f4)
# read Used Memory
RamUsed=$(/usr/local/nagios/libexec/check_snmp -P 2c -H $1 -C corpsnmp -t 60 -o HOST-RESOURCES-MIB::hrStorageUsed.1|cut -d" " -f4)
checkcount=0
iomodulestring=""
# convert KB --> MB
RamTotal=$(echo "$RamTotal / 1024" | bc)
RamUsed=$(echo "$RamUsed / 1024" | bc)
RamAvail=$(echo "$RamTotal - $RamUsed" |bc)
# calculate the percentage of available RAM
percentfree=$(echo "scale=2; $RamAvail/$RamTotal " |bc -l)
percentfree=$(echo "$percentfree*100" | bc -l)
percentfree=$(echo "scale=0; $percentfree / 1"| bc -l)
# thresholds for critical and warning
critical=$(echo 10)
warning=$(echo 15)
# compare free ram against thresholds
if [[ $percentfree -le $critical ]]; then
echo "CRITICAL - "$Name" is "$percentfree" % free | used="$RamUsed"MB;;total="$RamTotal"MB;;available="$RamAvail"MB"
exit $STATE_CRITICAL
elif [[ $percentfree -le $warning ]]; then
echo "WARNING - "$Name" is "$percentfree" % free | used="$RamUsed"MB;;total="$RamTotal"MB;;available="$RamAvail"MB"
exit $STATE_OK
elif [[ $percentfree -ge $warning ]]; then
echo "OK - "$Name" is "$percentfree" % free | used="$RamUsed"MB;;total="$RamTotal"MB;;available="$RamAvail"MB"
exit $STATE_WARNING
else
echo "problem - No data received from host"
exit $STATE_UNKNOWN
fi