I have a strange problem. I wrote a small plugin so that I can monitor how many open files are caused by a java process. When I execute the plugin on the remote server I get following return message:
Code: Select all
root@vm:~# /usr/lib/nagios/plugins/check_open_files.sh -s java -w 25 -c 35
CRITICAL - java has 434 or 42% files open of the 1024 allowed open files! OK 2014-01-22 17:41:56 0d 0h 1m 34s 1/3 CRITICAL - java has 1098 or 107% files open of the 1024 allowed open files!
These discrepancies are surfacing and disapearing all the time. What have I done wrong?
I include my plugin here:
Code: Select all
#!/bin/sh
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 of the License only.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
PROGNAME=`basename $0`
VERSION="Version 1.0"
AUTHOR="2014, Patrick Bass"
LV_W=100;
LV_C=100;
#Print Version
print_version() {
echo "$VERSION $AUTHOR"
}
# Help
usage(){
echo $PROGNAME $VERSION
echo $AUTHOR
echo
echo This is a Nagios plugin that will check the number of open files a service has running.
echo
echo OPTIONS:
echo -h Shows this help
echo -v Shows the Version
echo -s sets the service whose open files you want to check
echo -w sets the warning level
echo -c sets the critical level
}
# Option selection
while getopts "hvs:w:c:" opt; do
case $opt in
h)
usage
exit
;;
v)
print_version
exit
;;
s)
SERVICE=$OPTARG
;;
w)
LV_W=$OPTARG
;;
c)
LV_C=$OPTARG
;;
\?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
exit 1
;;
esac
done
# Are the levels for warning and critical set right?
if [ ${LV_W} -ge ${LV_C} ]
then
echo "Please adjust levels. The critical level must be higher than the warning level!"
exit 666
fi
# Lsof lists information about files opened by a processes
# print the number of bytes, words, and lines in files (-l = lines)
COUNT=$(lsof -c $SERVICE | wc -l)
# Get the value of maximum open files for each prozess and how much percent of these the service uses
MAX=$(ulimit -n)
USED=$(echo "scale=0; 100*$COUNT/$MAX" | bc -l)
# Compare the levels with the values for critical and warning
if [ $USED -ge ${LV_W} ]
then
if [ $USED -gt ${LV_C} ]
then
echo "CRITICAL - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files!"
exit $ST_CR
else
echo "WARNING - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files."
exit $ST_WR
fi
else
echo "OK - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files."
exit $ST_OK
fi
Thanks in advance for any advice offered.