100%+ used memory

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
altsysrq
Posts: 17
Joined: Thu Feb 26, 2015 12:35 pm

100%+ used memory

Post by altsysrq »

I am using custom_check_mem to get memory usage of our systems. We use the "-n" switch, which is supposed to not include cached memory in the total used. This is documented here:
https://support.nagios.com/kb/article/m ... s-774.html
If you do not want the cached memory to be part of the thresholds calculations the -n argument is used:
Unfortunately, this does not seem to be the case. Even the example that is given in the above document seems to include cached memory. In our situation, we end up with cached being added to total memory used and often exceed 100% of memory used. For example:

Code: Select all

root@pxe1:~# /usr/lib/nagios/plugins/custom_check_mem -w 20 -c 10 -n
 OK - 6875 / 3951 MB (174%) Free Memory, Used: 335 MB, Shared: 40 MB, Buffers: 3027 MB, Cached: 3259 MB | total=3951MB free=6875MB used=335MB shared=40 buffers=3027MB cached=3259MB

root@pxe1:~# /usr/lib/nagios/plugins/custom_check_mem -w 20 -c 10
 OK - 3617 / 3951 MB (91%) Free Memory, Used: 334 MB, Shared: 40 MB, Buffers: 3027 MB, Cached: 3259 MB | total=3951MB free=3617MB used=334MB shared=40 buffers=3027MB cached=3259MB
This seems to be a more recent issue as originally we were including -n to avoid cached memory usage from causing low memory errors from occurring. Am I mistakenly using this setting? Is it possible a Linux update may cause the memory to be calculated differently? Any other explanations are appreciated.

[edit] - not a php file
Last edited by altsysrq on Tue Jun 05, 2018 1:59 pm, edited 2 times in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: 100%+ used memory

Post by scottwilkerson »

Could you attach your copy of /usr/lib/nagios/plugins/custom_check_mem

also, what OS is this?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
altsysrq
Posts: 17
Joined: Thu Feb 26, 2015 12:35 pm

Re: 100%+ used memory

Post by altsysrq »

Apologies, here is the file (extension not allowed):

Code: Select all

#!/bin/bash
# Script to check real memory usage
# L.Gill 02/05/06 - V.1.0
# ------------------------------------------
# ########  Script Modifications  ##########
# ------------------------------------------
# Who	 When	   What
# ---    ----      ----
# LGill	 17/05/06  "$percent" lt 1% fix - sed edits dc result beggining with "."
#
#
#!/bin/bash
USAGE="`basename $0` [-w|--warning]<percent free> [-c|--critical]<percent free> [-n|--nocache]"
THRESHOLD_USAGE="WARNING threshold must be greater than CRITICAL: `basename $0` $*"
calc=/tmp/memcalc
percent_free=/tmp/mempercent
critical=""
warning=""
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
nocache=0
# print usage
if [[ $# -lt 4 ]]
then
	echo ""
	echo "Wrong Syntax: `basename $0` $*"
	echo ""
	echo "Usage: $USAGE"
	echo ""
	exit 0
fi
# read input
while [[ $# -gt 0 ]]
  do
        case "$1" in
               -w|--warning)
               shift
               warning=$1
        ;;
               -c|--critical)
               shift
               critical=$1
        ;;

               -n|--nocache)
               nocache=1
        ;;
        esac
        shift
  done
# verify input
if [[ $warning -eq $critical || $warning -lt $critical ]]
then
	echo ""
	echo "$THRESHOLD_USAGE"
	echo ""
        echo "Usage: $USAGE"
	echo ""
        exit 0
fi

memoutput=`free -m| head -2 | tail -1`

# Total memory available
#total=`free -m | head -2 |tail -1 |gawk '{print $2}'`
total=`echo $memoutput | gawk '{print $2}'`
# Total memory used
#used=`free -m | head -2 |tail -1 |gawk '{print $3}'`
used=`echo $memoutput | gawk '{print $3}'`
# Calc total minus used
#free=`free -m | head -2 |tail -1 |gawk '{print $2-$3}'`
if [ "$nocache" -eq "1" ]; then
    free=`echo $memoutput | gawk '{print $2-$3+$7}'`
else
    free=`echo $memoutput | gawk '{print $2-$3}'`
fi

shared=`echo $memoutput | gawk '{print $5}'`
buffers=`echo $memoutput | gawk '{print $6}'`
cached=`echo $memoutput | gawk '{print $7}'`
# free=$free-$cached
# normal values
#echo "$total"MB total
#echo "$used"MB used
#echo "$free"MB free

# make it into % percent free = ((free mem / total mem) * 100)
echo "5" > $calc # decimal accuracy
echo "k" >> $calc # commit
echo "100" >> $calc # multiply
echo "$free" >> $calc # division integer
echo "$total" >> $calc # division integer
echo "/" >> $calc # division sign
echo "*" >> $calc # multiplication sign
echo "p" >> $calc # print
percent=`/usr/bin/dc $calc|/bin/sed 's/^\./0./'|/usr/bin/tr "." " "|/usr/bin/gawk {'print $1'}`
#percent1=`/usr/bin/dc $calc`
#echo "$percent1"
if [[ "$percent" -le  $warning ]]
        then
    string="WARNING"
    result=1
fi
if [[ "$percent" -le  $critical ]]
    then
    string="CRITICAL"
    result=2
fi
if [[ "$percent" -gt  $warning ]]
    then
    string="OK"
    result=0
fi

echo "$string - $free / $total MB ($percent%) Free Memory, Used: $used MB, Shared: $shared MB, Buffers: $buffers MB, Cached: $cached MB | total="$total"MB free="$free"MB used="$used"MB shared="$shared"$MB buffers="$buffers"MB cached="$cached"MB"
exit $result
This is on an Ubuntu 16.04 system.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: 100%+ used memory

Post by scottwilkerson »

I just re-checked how this plugin works. It looks like on Ubuntu the 7th column of the command this plugin uses returns total available memory instead of the cached memory.

I'll submit a bug report but I don't have a fix for that at this time
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: 100%+ used memory

Post by npolovenko »

@altsysrq, Please open the plugin with a text editor and change:

Code: Select all

if [ "$nocache" -eq "1" ]; then
    free=`echo $memoutput | gawk '{print $2-$3+$7}'`
to

Code: Select all

if [ "$nocache" -eq "1" ]; then
    free=`echo $memoutput | gawk '{print $2-$3-$7}'`
So instead of +$7 there should be -$7.
Let me know if this fixes the problem.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
altsysrq
Posts: 17
Joined: Thu Feb 26, 2015 12:35 pm

Re: 100%+ used memory

Post by altsysrq »

That won't work. The command 'free' in Ubuntu 16.04 uses procps-ng 3.3.10 and doesn't calculate cache in the amount of memory used. This is explained through the man pages for used memory in 16.04:
used Used memory (calculated as total - free - buffers - cache)
Additionally, the script custom_check_mem looks for column 7 to be cache. In 16.04 that information is in column 6 (in 14.04 it is in column 7).

Ubuntu 16.04 (free from procps-ng 3.3.10):

Code: Select all

root@pxe1:~# free -m
              total        used        free      shared  buff/cache   available
Mem:           3951         334         529          40        3087        3257
Swap:          4091           0        4091
Ubuntu 14.04 (free from procps-ng 3.3.9):

Code: Select all

root@cacher:~# free -m
             total       used       free     shared    buffers     cached
Mem:          2000       1654        346          0        213       1068
-/+ buffers/cache:        372       1628
Swap:         1019          2       1017
The difference seems to be with two versions of 'free' from the procps-ng project/package. (Redhat 7 uses procps-ng 3.3.10, so there is a problem there too.)

If you can think of quick solutions to this let me know.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: 100%+ used memory

Post by npolovenko »

@altsysrq, Oh, I see...In this case, we'd probably have to stick with the bug report.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
altsysrq
Posts: 17
Joined: Thu Feb 26, 2015 12:35 pm

Re: 100%+ used memory

Post by altsysrq »

For reference, on an Ubuntu 18.04 test system:

Code: Select all

root@test:~# free -V
free from procps-ng 3.3.12
root@test:~# free -m
              total        used        free      shared  buff/cache   available
Mem:            962         179         465           2         317         634
Swap:          1923           0        1923
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: 100%+ used memory

Post by scottwilkerson »

altsysrq wrote:For reference, on an Ubuntu 18.04 test system:

Code: Select all

root@test:~# free -V
free from procps-ng 3.3.12
root@test:~# free -m
              total        used        free      shared  buff/cache   available
Mem:            962         179         465           2         317         634
Swap:          1923           0        1923
Thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
altsysrq
Posts: 17
Joined: Thu Feb 26, 2015 12:35 pm

Re: 100%+ used memory

Post by altsysrq »

I think we are settling on this code as a simple work around. We are using Puppet to deploy the update so it should be pretty smooth. Posting it here to help others and get feedback.

Replace lines 74-78 with:

Code: Select all

free_version=`free -V | gawk '{print $4}'`
free_requiredver=3.3.10

if [ "$(printf '%s\n' "$free_requiredver" "$free_version" | sort -V | head -n1)" = "$free_requiredver" ]; then
       #echo "Greater than or equal to $free_requiredver"
       if [ "$nocache" -eq "1" ]; then
         #echo "nocache"
         free=`echo $memoutput | gawk '{print $2-$3}'`
       else
         #echo "cache"
         free=`echo $memoutput | gawk '{print $2-$3-$6}'`
       fi
else
       #echo "Less than $free_requiredver"
       if [ "$nocache" -eq "1" ]; then
         #echo "nocache"
         free=`echo $memoutput | gawk '{print $2-$3+$7}'`
       else
         #echo "cache"
         free=`echo $memoutput | gawk '{print $2-$3}'`
       fi
fi
Let me know what you think or if you have any questions.

[edit] source is from here: https://unix.stackexchange.com/a/285928
Locked