Page 1 of 1

Different output of plugin via nrpe and on client

Posted: Thu Jan 23, 2014 8:55 am
by Mr. Vitriol
Hi,
I have a strange problem. I wrote a small plugin so that I can monitor how many open files are caused by a java process. When I execute the plugin on the remote server I get following return message:

Code: Select all

 root@vm:~# /usr/lib/nagios/plugins/check_open_files.sh -s java -w 25 -c 35
CRITICAL - java has 434 or 42% files open of the 1024 allowed open files! 
When I execute the same plugin via nrpe or look at the nagios webinterface I get:

OK 2014-01-22 17:41:56 0d 0h 1m 34s 1/3 CRITICAL - java has 1098 or 107% files open of the 1024 allowed open files!

These discrepancies are surfacing and disapearing all the time. What have I done wrong?

I include my plugin here:

Code: Select all


#!/bin/sh

#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation; version 2 of the License only.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#   Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

PROGNAME=`basename $0`
VERSION="Version 1.0"
AUTHOR="2014, Patrick Bass"

LV_W=100;
LV_C=100;


#Print Version
print_version() {
    echo "$VERSION $AUTHOR"
}

# Help
usage(){
    echo $PROGNAME $VERSION
    echo $AUTHOR
	echo 
	echo This is a Nagios plugin that will check the number of open files a service has running.
	echo 
	echo OPTIONS:
	echo  -h Shows this help
	echo  -v Shows the Version
	echo  -s sets the service whose open files you want to check
	echo  -w sets the warning level
	echo  -c sets the critical level
}

# Option selection
while getopts "hvs:w:c:" opt; do
  case $opt in
    h)
      usage
      exit
      ;;
    v)
      print_version
      exit
      ;;
	s)
	  SERVICE=$OPTARG
	  ;;
    w)
      LV_W=$OPTARG
      ;;
    c)
      LV_C=$OPTARG 
      ;;

    \?)
      echo "Invalid option: -$OPTARG" >&2
      exit 1
      ;;
    :)
      echo "Option -$OPTARG requires an argument." >&2
      exit 1
      ;;
  esac
done

# Are the levels for warning and critical set right?
    if [ ${LV_W} -ge ${LV_C} ]
    then
        echo "Please adjust levels. The critical level must be higher than the warning level!"
	exit 666
    fi

# Lsof lists information about files opened by a processes
# print the number of bytes, words, and lines in files (-l = lines)
COUNT=$(lsof -c $SERVICE | wc -l)

# Get the value of maximum open files for each prozess and how much percent of these the service uses
MAX=$(ulimit -n)
USED=$(echo "scale=0; 100*$COUNT/$MAX" | bc -l)

# Compare the levels with the values for critical and warning

if [ $USED -ge ${LV_W} ]
    then
        if [ $USED -gt ${LV_C} ]
			then
				echo "CRITICAL - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files!"
				exit $ST_CR
			else
				echo "WARNING - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files."
				exit $ST_WR
		fi
	else
		echo "OK - $SERVICE has $COUNT or $USED% files open of the $MAX allowed open files."
        exit $ST_OK
fi

In addition I have done something wrong with my script becaus even when the state is critical there is a green ok on the status screen.

Thanks in advance for any advice offered.

Re: Different output of plugin via nrpe and on client

Posted: Thu Jan 23, 2014 10:42 am
by slansing
Can you run this plugin both locally, and from the Nagios server's CLI as an NRPE check within 10 seconds of each other? I'm curious to see how close they will be. Of course, these numbers could greatly fluctuate within 10 seconds depending on how volatile they are, and what you are doing with Java.

Re: Different output of plugin via nrpe and on client

Posted: Fri Jan 24, 2014 2:59 am
by Mr. Vitriol
Most of the time the two outputs are identical or very close to each other. At other times I get so vastly different outputs that I am worried that something is wrong.

On this one server I get different outputs all the time:
5 open remote
473 open localy

A colleague of mine said that the maximum limit can be different for java then for all other processes. In that case my ulimit -n would get me a wrong number. Could that be the case here and could that be responsible?

Re: Different output of plugin via nrpe and on client

Posted: Fri Jan 24, 2014 2:39 pm
by slansing
Yes it is possible, also, it may just be that the java processes are spawning at an insane rate and then quieting down between the time you run the checks. I don't know about you guys, but when we play around with java here it can get quite volatile.