Plugin that detects network overload

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Plugin that detects network overload

Post by turboscrew »

Any ideas for a network interface traffic monitoring plugin that could detect if a NIC is overloaded?
That is: too many heavy networking programs running. I guess that implies some known time over which the traffic
is calculated so that short irregular peaks don't cause notifications.

I've be been checking out some plugins, but nothing is mentioned about time periods.

If a graphics template is found too, it would be really cool: After alarm, you could have a look into the load statistics..
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Plugin that detects network overload

Post by sreinhardt »

I don't think anything like this currently exists. You might check out bischeck, as it does store data from repeat checks and compares the differences between them, warning on too high of a delta between checks. Otherwise I think this is something that is a great idea, but would likely need to be developed.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Plugin that detects network overload

Post by tmcdonald »

Otherwise you might look into using MRTG:

http://bigunix.blogspot.com/2009/07/how ... g-and.html
Former Nagios employee
User avatar
rhassing
Posts: 412
Joined: Sat Oct 05, 2013 10:29 pm
Location: Netherlands

Re: Plugin that detects network overload

Post by rhassing »

There is a little problem with bandwidth monitoring and using snmp for it. You would only get an average of the used bandwith as you would only poll the device once every 5 mins.

If your switches /devices support it, it would be better to use sFlow to collect your bandwidth statistics. Normally this can be done with an sFlow collector like sFlowtool. (http://www.inmon.com/bin/sflowtool-3.32.tar.gz)
This is a daemon which collects sFlow bandwidth data.

With the following script you could put this info in RRD files:

Code: Select all

#! /usr/bin/perl
# Copyright (c) 2001 InMon Corp. Licensed under the terms of the InMon sFlow licence:
# http://www.inmon.com/technology/sflowlicense.txt

#makes things work when run without install
use lib qw( ../perl-shared/blib/lib ../perl-shared/blib/arch );

#makes programm work AFTER install
use lib qw( /usr/local/rrdtool-1.0.33/lib/perl ../lib/perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/auto/RRDs );

use vars qw(@ISA $loaded);
use RRDs;

$dataDir="/var/spool/sflow/rrd";
$dataDir2="/var/spool/sflow/discards";

while(<>) {
    ($key, $value) = split /[\t =]+/, $_;
    chomp $key;
    chomp $value;

    if($key eq "unixSecondsUTC") {$t = $value;}
    if($key eq "agent") {$agent = $value;}
    if($key eq "ifIndex") {$ifIndex = $value;}
    if($key eq "ifSpeed") {$ifSpeed = $value;}
    if($key eq "ifInOctets") {$ifInOctets = $value;}
    if($key eq "ifOutOctets") {
        $ifOutOctets = $value;

        $rrd = "$dataDir/$agent-$ifIndex.rrd";
        if(! -e $rrd) {
            RRDs::create ($rrd, "--start",$t-1, "--step",20,
                         "DS:bytesIn:COUNTER:120:0:10000000000",
                         "DS:bytesOut:COUNTER:120:0:10000000000",
                         "RRA:MIN:0.5:1:44640",
                         "RRA:MAX:0.5:1:44640",
                         "RRA:AVERAGE:0.5:3:525600");
            $ERROR = RRDs::error;
            die "$0: unable to create `$rrd': $ERROR\n" if $ERROR;
        }
        if ($agent!="0.0.0.0"){
        RRDs::update $rrd, "$t:$ifInOctets:$ifOutOctets";
        }
        if ($ERROR = RRDs::error) {
            die "$0: unable to update `$rrd': $ERROR\n";
        }
    }

    if($key eq "unixSecondsUTC") {$t = $value;}
    if($key eq "agent") {$agent = $value;}
    if($key eq "ifIndex") {$ifIndex = $value;}
    if($key eq "ifInErrors") {$ifInErrors = $value;}
    if($key eq "ifOutErrors") {$ifOutErrors = $value;}
    if($key eq "ifInDiscards") {$ifInDiscards = $value;}
    if($key eq "ifOutDiscards") {
        $ifOutDiscards = $value;

        $rrd2 = "$dataDir2/$agent-$ifIndex-discards.rrd";
        if(! -e $rrd2) {
            RRDs::create ($rrd2, "--start",$t-1, "--step",20,
                         "DS:DicardsIn:COUNTER:120:0:10000000",
                         "DS:ErrorsIn:COUNTER:120:0:10000000",
                         "DS:DicardsOut:COUNTER:120:0:10000000",
                         "DS:ErrorsOut:COUNTER:120:0:10000000",
                         "RRA:AVERAGE:0.5:3:576");
            $ERROR = RRDs::error;
            die "$0: unable to create `$rrd2': $ERROR\n" if $ERROR;
        }
        if ($agent!="0.0.0.0"){
        RRDs::update $rrd2, "$t:$ifInDiscards:$ifInErrors:$ifOutDiscards:$ifOutErrors";
        }
        if ($ERROR = RRDs::error) {
            die "$0: unable to update `$rrd2': $ERROR\n";
        }
    }



}
Then you would have your graphs :-)

The following script could be used to find out your utilization for the interfaces.

Code: Select all

OK=0
WARNING=1
CRITICAL=2
ERROR=4

ip=$1
port=$2

if [ -z "$ip" -o -z "$port" ] || [ -n "$c" -a -z "$w" ]
then
        echo usage: "$0 <ipaddress> <port number> [<linewidt> [<critical> <warning>]]"
        exit $ERROR
fi

lw=$3
c=$4
w=$5

c=${c:-30}
w=${w:-25}

lw=${lw:=1G}

dir=/var/spool/sflow/rrd


case $lw in
1M)
        lw=1000000
        ;;
10M)
        lw=10000000
        ;;
100M)
        lw=100000000
        ;;
1G)
        lw=1000000000
        ;;
10G)
        lw=10000000000
        ;;
100G)
        lw=100000000000
        ;;
*)
        echo linewidth value incorrect, mustby 1M, 10M, 100M, 1G, 10G or 100G
        exit $ERROR
        ;;
esac

file=$dir/$ip-$port.rrd

if ! data=$(rrdtool fetch $file AVERAGE -s -360 -e -120  | tail -5)
then
        echo Could not fetch rrd data from $file
        exit $UNKNOWN
fi
set $data
while [ -n "$1" ]
do
        t=$1
        in=$2
        out=$3
        shift 3
#       l=${l#*: }
#       in=${l%% *}

        ine=${in##*e+}
        inm=${in%%e*}
        inm=$(echo "$inm * 10 ^ $ine" | bc)
        in=${inm%%.*}

        (( sin += in ))

        oute=${out##*e+}
        outm=${out%%e*}
        outm=$(echo "$outm * 10 ^ $oute" | bc)
        out=${outm%%.*}

        (( sout += out ))
#echo $sin $sout
done

#echo $sin $sout
((ain = sin * 8 / 5 ))
((aout = sout * 8 / 5 ))
((pin = ain * 100 / lw))
((pout = aout * 100 / lw))

if [ $pin -ge $c -o $pout -ge $c ]
then
echo "CRITICAL: RX: $pin% TX: $pout%"
        exit $CRITICAL
fi

if [ $pin -ge $w -o $pout -ge $w ]
then
echo "WARNING: RX: $pin% TX: $pout%"
        exit $WARNING
fi
echo "OK: RX: $pin% TX: $pout%"
exit $OK
In this script you could adjust the time in which you would like to calculate your averages.

I think I could make this into a nice project in http://exchange.nagios.org/ :-)
Rob Hassing
Image
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Plugin that detects network overload

Post by tmcdonald »

@turboscrew: I think rhassing has the best suggestion so far. Let us know what you think and we can move forward with more specific assistance.
Former Nagios employee
Locked