More decimal strangeness

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: More decimal strangeness

Post by Box293 »

I think there are two things:

https://nagios-plugins.org/doc/guidelines.html#AEN200

Note # 1: space separated list of label/value pairs

Remove the , so it's only a space separating each data source, not a comma and and space:

Code: Select all

$PowerStruct.Outputstring += " | 'Total Power'=$($PowerStruct.PsuTotalCons)kW 'PSU1'=$($PowerStruct.Psu1CurCons)W 'PSU2'=$($PowerStruct.Psu2CurCons)W 'PSU3'=$($PowerStruct.Psu3CurCons)W 'PSU4'=$($PowerStruct.Psu4CurCons)W 'PSU5'=$($PowerStruct.Psu5CurCons)W 'PSU6'=$($PowerStruct.Psu6CurCons)W"
As for what I think the second thing that might be going wrong ... in your output:
'Total Power'=2,516kW
'Total Power'=0,800kW
'Total Power'=1,228kW
I haven’t had any experience with this, but is the , in the number a decimal point ... but in your country a comma is used instead of a decimal point? As part of your Operating System language thingy-ma-bob.

I ask this because on the article we have looked at, this example was given:
Suppose we wanted to display three decimal places? No problem; this command takes care of that: "{0:N3}" -f $a. Run that command and you’ll end up with output that looks like this: 19,385,790,464.000.
You can see it has , to separate the thousands and a decimal place followed by three 0's.

So if you're using the same formatting code, then you numbers should be:
'Total Power'=2.516kW
'Total Power'=0.800kW
'Total Power'=1.228kW
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
lgroschen
Posts: 384
Joined: Wed Nov 27, 2013 1:17 pm

Re: More decimal strangeness

Post by lgroschen »

Willem,

Just post back here when you're gone over box's solutions. He'll be available later today.
/Luke
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Troy, Luke,

I followed your suggestions and made the changes, removed the , between the datasources and added this line:

Code: Select all

$PowerStruct.PsuTotalCons = $PowerStruct.PsuTotalCons -replace ',','.'
	
This nicely resulted in my log outputting everything with a '.' instead of a ','.

Unfortunately it seems after a few time, I still have perfdata with too many decimals.

See attached screenshots for the perfdata with corresponding log entry:

Code: Select all

2015-01-26 22:07:39 : Outputstring: OK: {Powerconsumption Enclosure fts_enclosure: (Total: 2.610kW)(PSU1: 876W)(PSU2: 886W)(PSU3: 0W)(PSU4: 0W)(PSU5: 0W)(PSU6: 848W)} | 'Total Power'=2.610kW 'PSU1'=876W 'PSU2'=886W 'PSU3'=0W 'PSU4'=0W 'PSU5'=0W 'PSU6'=848W
Grtz

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: More decimal strangeness

Post by Box293 »

I think watching the debugging log in Nagios is in order:

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=-1/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
Force an immediate check. Once you see the incorrect number of decimal places on the Advanced tab you can turn debugging off.

Look through the file /usr/local/nagios/var/nagios.debug to see what data was received as this will be the data directly returned before anything is done with it.

When you are finished this turns debugging off:

Code: Select all

sed -i 's/.*debug_level=.*/debug_level=0/g' /usr/local/nagios/etc/nagios.cfg
service nagios restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Ok, enabled debugging. Tested untill I had the issue again (tested about 4 times)

See attached screenshot for the check with strange perfdata. Unfortunatel I was not able to find this particular host nor service in the debug log... Maybe you are able to find some extra information? I'll send you the debug log by pm.

Grtz

WIllem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: More decimal strangeness

Post by Box293 »

What is the name of the host as it appears in nagios?
What is the name of the service as it appears in nagios?

FYI, here's an example from MY debug log showing the data returned:

Code: Select all

[1422318276.978261] [016.0] [pid=28246] ** Handling check result for service 'Load' on host 'centos01' from 'Mod-Gearman Worker @ mg-worker-01.box293.local'...
[1422318276.978265] [016.1] [pid=28246] HOST: centos01, SERVICE: Load, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;20.000;0; load15=0.000;5.000;10.000;0; 
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Troy,

As the nagios.debug seems to be renamed to nagios.debug.old once it's larger then 1 MB and it just starts over again, it's almost impossible to make a check with the decimal issue in the timeframe the file is replaced, which is about 6 seconds.... Can I configure the debug file to or grow larger or write to nagios.debug.oldx instead of always overwriting the same file.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: More decimal strangeness

Post by cmerchant »

You want to increase the nagios debug log file by changing this entry in /usr/local/nagios/etc/nagios.cfg

Code: Select all

max_debug_file_size=1000000
(1MB)
to

Code: Select all

max_debug_file_size=5000000
(5MB)
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: More decimal strangeness

Post by WillemDH »

Hello,

I never really found the time to further debug this decimal strangeness, as the debug file was filling up too fast and I had more urgent work to do. But today I noticed similar decimal strangeness on my personal test XI server. So I created a new plugin, based on an existing plguin, that gathers the percentage of cpu and memory used of a process and also counts the number of processes. And while I'm quite sure I never pass any decimals for the process count, I do see decimals appearing into my Highcharts graphs. Check the screenshot.

And this is the plugin I created (based on an existing script, need a little work, but it does the job)

Code: Select all

#!/bin/bash                                                                                                              
                                                                                                                         
# Script name:          check_lin_process.sh                                                                                 
# Version:              v0.01.150816                                                                                     
# Created on:           17/08/2015                                                                                       
# Author:               Willem D'Haese                                                                                   
# Purpose:              Bash script that counts processes and returns total memory and cpu perfdata
# On GitHub:            https://github.com/willemdh/check_lin_process
# On OutsideIT:         http://outsideit.net/check-lin-process                                          
# Recent History:                                                                                                        
#       17/08/15 => Creation date
# Copyright:                                                                                                             
# This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public      
# License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any         
# later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without       
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public           
# License for more details. You should have received a copy of the GNU General Public License along with this            
# program.  If not, see <http://www.gnu.org/licenses/>.                                                                  

displayhelp="false"
runcheck=""
checkresult=""
hostname=""
process=""
fancyname=""
check=""
warning=""
critical=""

optstr=hH:p:N:C:w:c: 

while getopts $optstr Switchvar
do
	case $Switchvar in
		c) critical=$OPTARG ;;
		w) warning=$OPTARG ;;
		C) check=$OPTARG ;;
		N) fancyname=$OPTARG ;;
		p) process=$OPTARG ;;
		H) hostname=$OPTARG ;;
		h) displayhelp="true" ;;
	esac
done
shift $(( $OPTIND - 1 ))

if [ "$displayhelp" == "true" ]
then
	echo "
		-C,		Specify a check type: CPU or Memory, default check type is Memory
		-c,		Specify a critical level for the check, default is 70%
		-H,		Specify hostname
		-h,		Display help information
		-N,		Specify a fancy name for the process
		-p,		Specify a process to be monitored
		-w,		Specify a warning level for the check, default is 60%
		"
	exit
fi	

if [ "$process" == "" ]
then
	echo "No process was specified. The '-p' switch must be used to specify the process"
	exit 3
fi

if [ "$fancyname" == "" ]
then
	fancyname=$process
fi

if [ "$check" != "" ]
then
	if [ "$check" == "cpu" ] || [ "$check" == "Cpu" ] || [ "$check" == "CPU" ]
	then
		check="cpu"
	elif [ "$check" == "memory" ] || [ "$check" == "Memory" ] || [ "$check" == "MEMORY" ]
	then
		check="mem"
	fi
else
	check="all"
	checktoken="1"
fi

if [ "$check" == "cpu" ]
then
	runcheckcpu=`ps -C $process -o%cpu= | paste -sd+ | bc`
	roundedcpuresult=`echo $runcheckcpu | awk '{print int($1+0.5)}'`
elif [ "$check" == "mem" ]
then
	runcheckmem=`ps -C $process -o%mem= | paste -sd+ | bc`
	roundedmemresult=`echo $runcheckmem | awk '{print int($1+0.5)}'`
elif [ "$check" == "all" ]
then
	runcheckcpu=`ps -C $process -o%cpu= | paste -sd+ | bc`
	roundedcpuresult=`echo $runcheckcpu | awk '{print int($1+0.5)}'`
	runcheckmem=`ps -C $process -o%mem= | paste -sd+ | bc`
	roundedmemresult=`echo $runcheckmem | awk '{print int($1+0.5)}'`
	proccount=`ps -ef | grep -v grep | grep $process | wc -l` 
else
	echo "There is an error with your check's syntax. Please debug.."
	exit 3
fi

#echo "roundedcpuresult = $roundedcpuresult"
#echo "roundedmemresult = $roundedmemresult"

if [ "$warning" == "" ]
then
	warning=60
fi
if [ "$critical" == "" ]
then
	critical=70
fi

if [ "$roundedcpuresult" == "" -o  "$roundedmemresult" == "" ]
then
        echo "The "$fancyname" process doesn't appear to be running. Please debug."
        exit 3
fi

if [ "$roundedcpuresult" -ge "$critical"  -o  "$roundedmemresult" -ge "$critical" ]
then
        echo "CRITICAL: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
        exit 2
elif [ "$roundedcpuresult" -ge "$warning" -a "$roundedcpuresult" -ge "$warning" ]
then
        echo "WARNING: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
        exit 1
elif [ "$roundedcpuresult" -lt "$warning" -o  "$roundedmemresult" -lt "$warning" ]
then
        echo "OK: $fancyname {CPU: ${roundedcpuresult}%}{Memory: ${roundedmemresult}%}{Count: ${proccount}}} | ${fancyname}_cpu=$roundedcpuresult ${fancyname}_mem=$roundedmemresult ${fancyname}_count=$proccount"
        exit 0
fi

exit 3

Run it like this on a Nagios XI server for example:

Code: Select all

/usr/local/nagios/libexec/check_lin_process.sh -p httpd
So why would Highchart show decimal numbers. I'm still seeing this same behaviour at work too for checks that only output integers.

Grtz

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: More decimal strangeness

Post by Box293 »

I believe this is a side affect of pushing the data into RRD files. The numbers get averaged which I believe is when they become decimals.

What happens when you view the raw data of the RRD files? To do this use my Performance Data Tool.

http://exchange.nagios.org/directory/Ad ... ol/details

Upload it into Nagios XI via Admin > System Extensions > Manage Components.

There you should be able to see the raw data in the RRD files. Are you seeing decimals here?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked