Page 2 of 3
Re: More decimal strangeness
Posted: Sun Jan 25, 2015 6:49 pm
by Box293
I think there are two things:
https://nagios-plugins.org/doc/guidelines.html#AEN200
Note # 1: space separated list of label/value pairs
Remove the , so it's only a space separating each data source, not a comma and and space:
Code: Select all
$PowerStruct.Outputstring += " | 'Total Power'=$($PowerStruct.PsuTotalCons)kW 'PSU1'=$($PowerStruct.Psu1CurCons)W 'PSU2'=$($PowerStruct.Psu2CurCons)W 'PSU3'=$($PowerStruct.Psu3CurCons)W 'PSU4'=$($PowerStruct.Psu4CurCons)W 'PSU5'=$($PowerStruct.Psu5CurCons)W 'PSU6'=$($PowerStruct.Psu6CurCons)W"
As for what I think the second thing that might be going wrong ... in your output:
'Total Power'=2,516kW
'Total Power'=0,800kW
'Total Power'=1,228kW
I haven’t had any experience with this, but is the , in the number a decimal point ... but in your country a comma is used instead of a decimal point? As part of your Operating System language thingy-ma-bob.
I ask this because on the article we have looked at, this example was given:
Suppose we wanted to display three decimal places? No problem; this command takes care of that: "{0:N3}" -f $a. Run that command and you’ll end up with output that looks like this: 19,385,790,464.000.
You can see it has , to separate the thousands and a decimal place followed by three 0's.
So if you're using the same formatting code, then you numbers should be:
'Total Power'=2.516kW
'Total Power'=0.800kW
'Total Power'=1.228kW
Re: More decimal strangeness
Posted: Mon Jan 26, 2015 12:08 pm
by lgroschen
Willem,
Just post back here when you're gone over box's solutions. He'll be available later today.
Re: More decimal strangeness
Posted: Mon Jan 26, 2015 4:22 pm
by WillemDH
Troy, Luke,
I followed your suggestions and made the changes, removed the , between the datasources and added this line:
Code: Select all
$PowerStruct.PsuTotalCons = $PowerStruct.PsuTotalCons -replace ',','.'
This nicely resulted in my log outputting everything with a '.' instead of a ','.
Unfortunately it seems after a few time, I still have perfdata with too many decimals.
See attached screenshots for the perfdata with corresponding log entry:
Code: Select all
2015-01-26 22:07:39 : Outputstring: OK: {Powerconsumption Enclosure fts_enclosure: (Total: 2.610kW)(PSU1: 876W)(PSU2: 886W)(PSU3: 0W)(PSU4: 0W)(PSU5: 0W)(PSU6: 848W)} | 'Total Power'=2.610kW 'PSU1'=876W 'PSU2'=886W 'PSU3'=0W 'PSU4'=0W 'PSU5'=0W 'PSU6'=848W
Grtz
Willem
Re: More decimal strangeness
Posted: Mon Jan 26, 2015 4:58 pm
by Box293
I think watching the debugging log in Nagios is in order:
Code: Select all
sed -i 's/.*debug_level=.*/debug_level=-1/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
Force an immediate check. Once you see the incorrect number of decimal places on the Advanced tab you can turn debugging off.
Look through the file /usr/local/nagios/var/nagios.debug to see what data was received as this will be the data directly returned before anything is done with it.
When you are finished this turns debugging off:
Code: Select all
sed -i 's/.*debug_level=.*/debug_level=0/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
Re: More decimal strangeness
Posted: Mon Jan 26, 2015 5:17 pm
by WillemDH
Ok, enabled debugging. Tested untill I had the issue again (tested about 4 times)
See attached screenshot for the check with strange perfdata. Unfortunatel I was not able to find this particular host nor service in the debug log... Maybe you are able to find some extra information? I'll send you the debug log by pm.
Grtz
WIllem
Re: More decimal strangeness
Posted: Mon Jan 26, 2015 5:32 pm
by Box293
What is the name of the host as it appears in nagios?
What is the name of the service as it appears in nagios?
FYI, here's an example from MY debug log showing the data returned:
Code: Select all
[1422318276.978261] [016.0] [pid=28246] ** Handling check result for service 'Load' on host 'centos01' from 'Mod-Gearman Worker @ mg-worker-01.box293.local'...
[1422318276.978265] [016.1] [pid=28246] HOST: centos01, SERVICE: Load, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;20.000;0; load15=0.000;5.000;10.000;0;
Re: More decimal strangeness
Posted: Tue Jan 27, 2015 5:53 am
by WillemDH
Troy,
As the nagios.debug seems to be renamed to nagios.debug.old once it's larger then 1 MB and it just starts over again, it's almost impossible to make a check with the decimal issue in the timeframe the file is replaced, which is about 6 seconds.... Can I configure the debug file to or grow larger or write to nagios.debug.oldx instead of always overwriting the same file.
Grtz
Willem
Re: More decimal strangeness
Posted: Tue Jan 27, 2015 3:49 pm
by cmerchant
You want to increase the nagios debug log file by changing this entry in /usr/local/nagios/etc/nagios.cfg
(1MB)
to
(5MB)
Re: More decimal strangeness
Posted: Thu Aug 20, 2015 3:20 pm
by WillemDH
Hello,
I never really found the time to further debug this decimal strangeness, as the debug file was filling up too fast and I had more urgent work to do. But today I noticed similar decimal strangeness on my personal test XI server. So I created a new plugin, based on an existing plguin, that gathers the percentage of cpu and memory used of a process and also counts the number of processes. And while I'm quite sure I never pass any decimals for the process count, I do see decimals appearing into my Highcharts graphs. Check the screenshot.
And this is the plugin I created (based on an existing script, need a little work, but it does the job)
Code: Select all
#!/bin/bash
# Script name: check_lin_process.sh
# Version: v0.01.150816
# Created on: 17/08/2015
# Author: Willem D'Haese
# Purpose: Bash script that counts processes and returns total memory and cpu perfdata
# On GitHub: https://github.com/willemdh/check_lin_process
# On OutsideIT: http://outsideit.net/check-lin-process
# Recent History:
# 17/08/15 => Creation date
# Copyright:
# This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public
# License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any
# later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
# License for more details. You should have received a copy of the GNU General Public License along with this
# program. If not, see <http://www.gnu.org/licenses/>.
displayhelp="false"
runcheck=""
checkresult=""
hostname=""
process=""
fancyname=""
check=""
warning=""
critical=""
optstr=hH:p:N:C:w:c:
while getopts $optstr Switchvar
do
case $Switchvar in
c) critical=$OPTARG ;;
w) warning=$OPTARG ;;
C) check=$OPTARG ;;
N) fancyname=$OPTARG ;;
p) process=$OPTARG ;;
H) hostname=$OPTARG ;;
h) displayhelp="true" ;;
esac
done
shift $(( $OPTIND - 1 ))
if [ "$displayhelp" == "true" ]
then
echo "
-C, Specify a check type: CPU or Memory, default check type is Memory
-c, Specify a critical level for the check, default is 70%
-H, Specify hostname
-h, Display help information
-N, Specify a fancy name for the process
-p, Specify a process to be monitored
-w, Specify a warning level for the check, default is 60%
"
exit
fi
if [ "$process" == "" ]
then
echo "No process was specified. The '-p' switch must be used to specify the process"
exit 3
fi
if [ "$fancyname" == "" ]
then
fancyname=$process
fi
if [ "$check" != "" ]
then
if [ "$check" == "cpu" ] || [ "$check" == "Cpu" ] || [ "$check" == "CPU" ]
then
check="cpu"
elif [ "$check" == "memory" ] || [ "$check" == "Memory" ] || [ "$check" == "MEMORY" ]
then
check="mem"
fi
else
check="all"
checktoken="1"
fi
if [ "$check" == "cpu" ]
then
runcheckcpu=`ps -C $process -o%cpu= | paste -sd+ | bc`
roundedcpuresult=`echo $runcheckcpu | awk '{print int($1+0.5)}'`
elif [ "$check" == "mem" ]
then
runcheckmem=`ps -C $process -o%mem= | paste -sd+ | bc`
roundedmemresult=`echo $runcheckmem | awk '{print int($1+0.5)}'`
elif [ "$check" == "all" ]
then
runcheckcpu=`ps -C $process -o%cpu= | paste -sd+ | bc`
roundedcpuresult=`echo $runcheckcpu | awk '{print int($1+0.5)}'`
runcheckmem=`ps -C $process -o%mem= | paste -sd+ | bc`
roundedmemresult=`echo $runcheckmem | awk '{print int($1+0.5)}'`
proccount=`ps -ef | grep -v grep | grep $process | wc -l`
else
echo "There is an error with your check's syntax. Please debug.."
exit 3
fi
#echo "roundedcpuresult = $roundedcpuresult"
#echo "roundedmemresult = $roundedmemresult"
if [ "$warning" == "" ]
then
warning=60
fi
if [ "$critical" == "" ]
then
critical=70
fi
if [ "$roundedcpuresult" == "" -o "$roundedmemresult" == "" ]
then
echo "The "$fancyname" process doesn't appear to be running. Please debug."
exit 3
fi
if [ "$roundedcpuresult" -ge "$critical" -o "$roundedmemresult" -ge "$critical" ]
then
echo "CRITICAL: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
exit 2
elif [ "$roundedcpuresult" -ge "$warning" -a "$roundedcpuresult" -ge "$warning" ]
then
echo "WARNING: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
exit 1
elif [ "$roundedcpuresult" -lt "$warning" -o "$roundedmemresult" -lt "$warning" ]
then
echo "OK: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
exit 0
fi
exit 3
Run it like this on a Nagios XI server for example:
Code: Select all
/usr/local/nagios/libexec/check_lin_process.sh -p httpd
So why would Highchart show decimal numbers. I'm still seeing this same behaviour at work too for checks that only output integers.
Grtz
Willem
Re: More decimal strangeness
Posted: Thu Aug 20, 2015 10:33 pm
by Box293
I believe this is a side affect of pushing the data into RRD files. The numbers get averaged which I believe is when they become decimals.
What happens when you view the raw data of the RRD files? To do this use my Performance Data Tool.
http://exchange.nagios.org/directory/Ad ... ol/details
Upload it into Nagios XI via Admin > System Extensions > Manage Components.
There you should be able to see the raw data in the RRD files. Are you seeing decimals here?