Page 5 of 6

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Fri Jul 24, 2015 9:08 am
by ssax
You'll have 1000 open processes consuming memory at the same time with minimal CPU impact, I would personally go the tmp file route but you should evaluate both options in a test environment to see what the impact would be.

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Fri Jul 24, 2015 9:14 am
by eloyd
[Edit: @ssax wrote his reply as I was writing mine, but I like to ramble more.] :mrgreen:

David,

I am going to write this ONLY from the perspective of systems management, not from a Nagios perspective of what is the best route forward. So bear with me. (That's mostly a disclaimer so Bandit won't get mad at me) :-)

Imagine you have one thing that runs every minute and takes 59 seconds to complete. In Unix, that means a process is created (called "forked") that runs for that entire duration, even if most of the time is spent sleeping. Keeping track of the process and allocating memory and worrying about swapping it on/off disk to make room for other processes is what the Unix kernel does. Imagine you have 1000 of these things that are running for 59 seconds each and checking every minute. Unix is now doing 1000 times more work, keeping track of 1000 times as much memory and process slots and swap space, etc. In general, if you can keep the amount of work down, the better off everything is.

However, compare that to a disk, which is 100 times slower than core memory (unless you're using fast SSD drives these days). Writing or reading a piece of information may only take a millisecond, but if you're doing it 1000 times every minute, then you might be spending time waiting for the disk to spin around and put the data under the head and then go do it again a millisecond later for the next one.

So - and you're not going to like this - the answer is going to be "experiment." I would think that the disk approach would be faster and easier ovearll (you could even populate and read a database record, but that's more complex and dependent upon a database), plus less overhead on the system as a whole. You could just create a /tmp/<host>/<service>/lastrun file that contains the data for the last run for <service> on <host> and then echo > to the file or cat < from the file to get your data.

Of course, sleeping for 60 seconds and running the test again is just as easy, except consider that if you have Nagios run the check every minute, and you're sleeping for 60 seconds in the check, you're actually going to run the check every 2 minutes because Nagios won't run the check again until the first one has finished. If you read a file and update it, then you can run every minute without delay.

Try it both ways for a day and see which one is better! :-)

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Fri Jul 24, 2015 2:10 pm
by ssax
eloyd, very thorough!

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Fri Jul 24, 2015 7:07 pm
by perric
Thanks again for explanations. I was concerned that the sleep 60 might have an unexpected impact, like the idea that it would run every two minutes now, instead of one minute.

I have not written a shell script to retrieve data from a file. Sorry, I am new to shell scripting :-). Is that easy enough to do? If that's out of the scope of this forum, I can try to find out how to do it.

David

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Sat Jul 25, 2015 10:00 am
by eloyd
I can't test, but this should work. It's @perric's latest code, modified to use a /tmp/ip.address.goes.here.in.last and /tmp/ip.address.goes.here.out.last temp file to keep track of the previous run's data.

Code: Select all

#!/bin/bash

# NOTE:  First reading is now read from a /tmp file stored the last time we ran
#        If previous reading is not there, we assume it was zero
#        Reading is JUST the number, not the entire output string
inbound=0
[ -r "/tmp/$1.in.last" ] && inbound=`cat /tmp/$1.in.last`
outbound=0
[ -r "/tmp/$1.out.last" ] & outbound=`cat /tmp/$1.out.last`

#### Collect 2nd reading ####
inbound_temp2="$(/usr/local/nagios/libexec/check_snmp $1 -P 2c -C $2 -o 1.3.6.1.2.1.31.1.1.1.6.$4)"
outbound_temp2="$(/usr/local/nagios/libexec/check_snmp $1 -P 2c -C $2 -o 1.3.6.1.2.1.31.1.1.1.10.$4)"
inbound2=`echo $inbound_temp2|grep -oP '(?<=\-)(.*?)(?=\|)'`
outbound2=`echo $outbound_temp2 |grep -oP '(?<=\-)(.*?)(?=\|)'`

# Save current numbers for the next run
echo "$inbound2" > /tmp/$1.in.last
echo "$outbound2" > /tmp/$1.out.last


### Perform Logic here
inbound_per=$(( (($inbound2 - $inbound) * 8) / ($3 * 1000000)))
outbound_per=$(( (($outbound2 - $outbound) * 8) / ($3 * 1000000)))
total=$(( ($inbound_per + $outbound_per)/2 ))

perfdata="| 'inbound'=$inbound_per%;;;0;100 'outbound'=$outbound_per%;;;0;100"

if [ "$total" -ge "90" ];then
  echo "CRITICAL - Total traffic above threshold - $total $perfdata"
  exit 2
elif [ "$total" -ge "80" ];then
  echo "WARNING - Total traffic above threshold - $total $perfdata"
  exit 1
else
  echo "OK - Traffic at good percentage $perfdata"
  exit 0
fi

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Mon Jul 27, 2015 11:22 am
by tmcdonald
perric wrote:If that's out of the scope of this forum, I can try to find out how to do it.
I would say it tip-toes on scope.

Officially, it is out of scope in that your support contract does not include us helping with custom scripts you write. However, I cannot in recent memory recall a time we have turned someone away outright. We can give pointers, highlight obvious flaws, and occasionally we can whip up a 20-line script. We won't train you, but we can help you.

And remember, this applies only to Nagios staff. If members of the community wish to help we are not going to stop them :)

However if you are asking for help from the community and not us directly, please post in General to get a wider audience and keep our SLA in check (otherwise we need to keep posting back every 24 hours with a bogus reply to keep the thread off our dashboard like I am sorta doing right now).

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Mon Jul 27, 2015 12:09 pm
by eloyd
Trevor is the master of the bogus reply. I learned everything I know from him. :-)

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Mon Jul 27, 2015 4:20 pm
by ssax
perric, let us know if eloyd's solution will work for you.

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Tue Aug 18, 2015 2:19 pm
by perric
Hi All,

I have been away and just got back. Thanks for all replies, etc. I know how to write the data to a file, but I need to figure out how to retrieve the data from a file.

The initial question is resolved.

David

Re: Combine 2 SNMP Service Checks into a graph & calculation

Posted: Tue Aug 18, 2015 2:22 pm
by eloyd
Look carefully at my sample script for how to do it in a simple case.