Page 2 of 3

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 11:58 am
by aap
Thanks for your help.

see attached

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 12:32 pm
by tgriep
Can you login to the XI server as root, run the following commands and post the output?

Code: Select all

/usr/local/nagios/libexec/check_f5.pl -H 10.127.220.150 -m fan -t 30 -C <Communityname>
/usr/local/nagios/libexec/check_f5.pl --help
Replace <Communityname> with the settings from your device.
If the above works, you will need to add the community name to your service check.

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 12:45 pm
by aap
See both results below

Average fan speed 10546 rpm.|fan_speed=10546;;;0;

Usage: /usr/local/nagios/libexec/check_f5.pl -H host [-C community] [-v][-d][-h][-M][-t timout]
[--no-perf] [--no-optimise]
[-f cachefile] [-x cacheexpiretime]
[-m modules]

-H --host : Specify F5 hostname (f5)
Can be used multiple times; the first to answer is used.
-C --community : Specify SNMP community string (public)
-d --debug : Debug mode. Use multiple times for more detail
-h --help : Show help
-M --mrtg : MRTG mode. Also --no-mrtg for Nagios mode.
-t --timeout : SNMP query timeout (10)
-f --cache-file : File basename for SNMP cache (/tmp/f5.cache)
-x --cache-expire : Seconds before cache becomes invalid (290)
--no-perf : Disable perfstats in Nagios output
--no-optimise : Retrieve entire SNMP tree for tables (use in conjunction
with cache if many separate server() checks being done)
-m --modules : List modules to enable. Space separated. Can be used
multiple times if required. See below.
-v --verbose : verbose logging

MRTG mode
In MRTG mode, only the first module to provide a metric will be output.

Available modules:
cpu[:n][(warnpc,critpc)] MRTG: user and idle percent (80,90)
mem[ory][:percent][(warnpc,critpc)] MRTG: used and total bytes (or %) (80,90)
temp[erature][(warn,crit)]
fan No MRTG output
psu No MRTG output
health Same as 'cpu mem temp fan psu'
ssl[:server][(activewarn,activecrit)] MRTG: active and total SSL (1400,1000)
traffic[:server] MRTG: bytes in/out. No Nagios.
server:name[(actvw,actvc)] For virt server name (1400,1000)
server:ipaddr[:port][(actvw,actvc)] For virt server ipaddr:port
server[(actvw,actvc)] Over ALL virtual servers (140000,100000)
cert[ificate][:certname][([warn,]crit)] Check certificate days left, no MRTG
conn[ections][(actvw,actvc)] same as 'server'
group Failover health. No MRTG output.
cm Same as 'group'
cache No output; prepare SNMP cache

Server checks:
For server checks, the number of active connections will be thresholded.
Next, the availability of the Server will be checked and will return WARN if
not all the active pool members are available.

If the same module is used multiple times, only the first one will be used.

Examples:
/usr/local/nagios/libexec/check_f5.pl -H myf5 -C public -m 'cpu(80,90) server:/Production/foobar'
/usr/local/nagios/libexec/check_f5.pl -H myf5 -m 'health certificates(14,7) conn(1500,2000)'

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 12:54 pm
by tgriep
You heed to edit the services for the fxtplbdevp1 host and add the missing option

Code: Select all

-C <community>
for all of the SNMP checks and that should resolve the timeouts you are seeing.

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 1:02 pm
by aap
Done.

I will monitor and confirm. Thanks for looking into it.

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 1:03 pm
by tgriep
Your Welcome. Glad to help.

Re: Monitoring F5 Load balancers

Posted: Mon Jul 18, 2016 2:41 am
by aap
Unfortunately, the issue persists.

I've had 57 false Critical alerts in the last 24 hours because of service check timeouts. I am not experiencing the same problem with all other services though which leans me to believe it may be the check script itself or the device.

Is there any other thing we can check?

Re: Monitoring F5 Load balancers

Posted: Mon Jul 18, 2016 10:09 am
by tgriep
It looks like the plugin has a default timeout of 10 seconds which might not be long enough in your environment.
To change this to a longer timeout, edit the command in Core Configuration Manager and add the following to it.

Code: Select all

-t 60
That will increase the timeout and hopefully that will fix the issue.

Re: Monitoring F5 Load balancers

Posted: Mon Jul 18, 2016 10:13 am
by rkennedy
Is the load high on the F5 device by any chance? I wonder if it's just getting overloaded. Are you using this check f5 plugin for any other load balancers? Just trying to narrow it down to see if we can figure out the root cause.

What value are you using currently for the timeout? I've seen SNMP take 2-3 mins before to respond. I wonder if it would work if you increased it to -t 120 or -t 180 as it sounds isolated to this device.

Re: Monitoring F5 Load balancers

Posted: Tue Jul 19, 2016 4:50 am
by aap
Unfortunately, the highest timeout on the script is 60s. See output below

COMMAND: /usr/local/nagios/libexec/check_f5.pl -H 10.xxx.xxx.xxx -m health -t 180 -C public
OUTPUT: SNMP Error: The timeout value 180 is out of range (1..60)