Combine 2 SNMP Service Checks into a graph & calculation

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by ssax »

You'll have 1000 open processes consuming memory at the same time with minimal CPU impact, I would personally go the tmp file route but you should evaluate both options in a test environment to see what the impact would be.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by eloyd »

[Edit: @ssax wrote his reply as I was writing mine, but I like to ramble more.] :mrgreen:

David,

I am going to write this ONLY from the perspective of systems management, not from a Nagios perspective of what is the best route forward. So bear with me. (That's mostly a disclaimer so Bandit won't get mad at me) :-)

Imagine you have one thing that runs every minute and takes 59 seconds to complete. In Unix, that means a process is created (called "forked") that runs for that entire duration, even if most of the time is spent sleeping. Keeping track of the process and allocating memory and worrying about swapping it on/off disk to make room for other processes is what the Unix kernel does. Imagine you have 1000 of these things that are running for 59 seconds each and checking every minute. Unix is now doing 1000 times more work, keeping track of 1000 times as much memory and process slots and swap space, etc. In general, if you can keep the amount of work down, the better off everything is.

However, compare that to a disk, which is 100 times slower than core memory (unless you're using fast SSD drives these days). Writing or reading a piece of information may only take a millisecond, but if you're doing it 1000 times every minute, then you might be spending time waiting for the disk to spin around and put the data under the head and then go do it again a millisecond later for the next one.

So - and you're not going to like this - the answer is going to be "experiment." I would think that the disk approach would be faster and easier ovearll (you could even populate and read a database record, but that's more complex and dependent upon a database), plus less overhead on the system as a whole. You could just create a /tmp/<host>/<service>/lastrun file that contains the data for the last run for <service> on <host> and then echo > to the file or cat < from the file to get your data.

Of course, sleeping for 60 seconds and running the test again is just as easy, except consider that if you have Nagios run the check every minute, and you're sleeping for 60 seconds in the check, you're actually going to run the check every 2 minutes because Nagios won't run the check again until the first one has finished. If you read a file and update it, then you can run every minute without delay.

Try it both ways for a day and see which one is better! :-)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by ssax »

eloyd, very thorough!
perric
Posts: 161
Joined: Fri Mar 28, 2014 10:37 am

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by perric »

Thanks again for explanations. I was concerned that the sleep 60 might have an unexpected impact, like the idea that it would run every two minutes now, instead of one minute.

I have not written a shell script to retrieve data from a file. Sorry, I am new to shell scripting :-). Is that easy enough to do? If that's out of the scope of this forum, I can try to find out how to do it.

David
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by eloyd »

I can't test, but this should work. It's @perric's latest code, modified to use a /tmp/ip.address.goes.here.in.last and /tmp/ip.address.goes.here.out.last temp file to keep track of the previous run's data.

Code: Select all

#!/bin/bash

# NOTE:  First reading is now read from a /tmp file stored the last time we ran
#        If previous reading is not there, we assume it was zero
#        Reading is JUST the number, not the entire output string
inbound=0
[ -r "/tmp/$1.in.last" ] && inbound=`cat /tmp/$1.in.last`
outbound=0
[ -r "/tmp/$1.out.last" ] & outbound=`cat /tmp/$1.out.last`

#### Collect 2nd reading ####
inbound_temp2="$(/usr/local/nagios/libexec/check_snmp $1 -P 2c -C $2 -o 1.3.6.1.2.1.31.1.1.1.6.$4)"
outbound_temp2="$(/usr/local/nagios/libexec/check_snmp $1 -P 2c -C $2 -o 1.3.6.1.2.1.31.1.1.1.10.$4)"
inbound2=`echo $inbound_temp2|grep -oP '(?<=\-)(.*?)(?=\|)'`
outbound2=`echo $outbound_temp2 |grep -oP '(?<=\-)(.*?)(?=\|)'`

# Save current numbers for the next run
echo "$inbound2" > /tmp/$1.in.last
echo "$outbound2" > /tmp/$1.out.last


### Perform Logic here
inbound_per=$(( (($inbound2 - $inbound) * 8) / ($3 * 1000000)))
outbound_per=$(( (($outbound2 - $outbound) * 8) / ($3 * 1000000)))
total=$(( ($inbound_per + $outbound_per)/2 ))

perfdata="| 'inbound'=$inbound_per%;;;0;100 'outbound'=$outbound_per%;;;0;100"

if [ "$total" -ge "90" ];then
  echo "CRITICAL - Total traffic above threshold - $total $perfdata"
  exit 2
elif [ "$total" -ge "80" ];then
  echo "WARNING - Total traffic above threshold - $total $perfdata"
  exit 1
else
  echo "OK - Traffic at good percentage $perfdata"
  exit 0
fi
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by tmcdonald »

perric wrote:If that's out of the scope of this forum, I can try to find out how to do it.
I would say it tip-toes on scope.

Officially, it is out of scope in that your support contract does not include us helping with custom scripts you write. However, I cannot in recent memory recall a time we have turned someone away outright. We can give pointers, highlight obvious flaws, and occasionally we can whip up a 20-line script. We won't train you, but we can help you.

And remember, this applies only to Nagios staff. If members of the community wish to help we are not going to stop them :)

However if you are asking for help from the community and not us directly, please post in General to get a wider audience and keep our SLA in check (otherwise we need to keep posting back every 24 hours with a bogus reply to keep the thread off our dashboard like I am sorta doing right now).
Former Nagios employee
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by eloyd »

Trevor is the master of the bogus reply. I learned everything I know from him. :-)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by ssax »

perric, let us know if eloyd's solution will work for you.
perric
Posts: 161
Joined: Fri Mar 28, 2014 10:37 am

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by perric »

Hi All,

I have been away and just got back. Thanks for all replies, etc. I know how to write the data to a file, but I need to figure out how to retrieve the data from a file.

The initial question is resolved.

David
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Combine 2 SNMP Service Checks into a graph & calculation

Post by eloyd »

Look carefully at my sample script for how to do it in a simple case.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Locked