Monitoring F5 Load balancers

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
aap
Posts: 180
Joined: Wed Oct 12, 2011 4:01 am

Re: Monitoring F5 Load balancers

Post by aap »

Thanks for your help.

see attached
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Monitoring F5 Load balancers

Post by tgriep »

Can you login to the XI server as root, run the following commands and post the output?

Code: Select all

/usr/local/nagios/libexec/check_f5.pl -H 10.127.220.150 -m fan -t 30 -C <Communityname>
/usr/local/nagios/libexec/check_f5.pl --help
Replace <Communityname> with the settings from your device.
If the above works, you will need to add the community name to your service check.
Be sure to check out our Knowledgebase for helpful articles and solutions!
aap
Posts: 180
Joined: Wed Oct 12, 2011 4:01 am

Re: Monitoring F5 Load balancers

Post by aap »

See both results below

Average fan speed 10546 rpm.|fan_speed=10546;;;0;

Usage: /usr/local/nagios/libexec/check_f5.pl -H host [-C community] [-v][-d][-h][-M][-t timout]
[--no-perf] [--no-optimise]
[-f cachefile] [-x cacheexpiretime]
[-m modules]

-H --host : Specify F5 hostname (f5)
Can be used multiple times; the first to answer is used.
-C --community : Specify SNMP community string (public)
-d --debug : Debug mode. Use multiple times for more detail
-h --help : Show help
-M --mrtg : MRTG mode. Also --no-mrtg for Nagios mode.
-t --timeout : SNMP query timeout (10)
-f --cache-file : File basename for SNMP cache (/tmp/f5.cache)
-x --cache-expire : Seconds before cache becomes invalid (290)
--no-perf : Disable perfstats in Nagios output
--no-optimise : Retrieve entire SNMP tree for tables (use in conjunction
with cache if many separate server() checks being done)
-m --modules : List modules to enable. Space separated. Can be used
multiple times if required. See below.
-v --verbose : verbose logging

MRTG mode
In MRTG mode, only the first module to provide a metric will be output.

Available modules:
cpu[:n][(warnpc,critpc)] MRTG: user and idle percent (80,90)
mem[ory][:percent][(warnpc,critpc)] MRTG: used and total bytes (or %) (80,90)
temp[erature][(warn,crit)]
fan No MRTG output
psu No MRTG output
health Same as 'cpu mem temp fan psu'
ssl[:server][(activewarn,activecrit)] MRTG: active and total SSL (1400,1000)
traffic[:server] MRTG: bytes in/out. No Nagios.
server:name[(actvw,actvc)] For virt server name (1400,1000)
server:ipaddr[:port][(actvw,actvc)] For virt server ipaddr:port
server[(actvw,actvc)] Over ALL virtual servers (140000,100000)
cert[ificate][:certname][([warn,]crit)] Check certificate days left, no MRTG
conn[ections][(actvw,actvc)] same as 'server'
group Failover health. No MRTG output.
cm Same as 'group'
cache No output; prepare SNMP cache

Server checks:
For server checks, the number of active connections will be thresholded.
Next, the availability of the Server will be checked and will return WARN if
not all the active pool members are available.

If the same module is used multiple times, only the first one will be used.

Examples:
/usr/local/nagios/libexec/check_f5.pl -H myf5 -C public -m 'cpu(80,90) server:/Production/foobar'
/usr/local/nagios/libexec/check_f5.pl -H myf5 -m 'health certificates(14,7) conn(1500,2000)'
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Monitoring F5 Load balancers

Post by tgriep »

You heed to edit the services for the fxtplbdevp1 host and add the missing option

Code: Select all

-C <community>
for all of the SNMP checks and that should resolve the timeouts you are seeing.
Be sure to check out our Knowledgebase for helpful articles and solutions!
aap
Posts: 180
Joined: Wed Oct 12, 2011 4:01 am

Re: Monitoring F5 Load balancers

Post by aap »

Done.

I will monitor and confirm. Thanks for looking into it.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Monitoring F5 Load balancers

Post by tgriep »

Your Welcome. Glad to help.
Be sure to check out our Knowledgebase for helpful articles and solutions!
aap
Posts: 180
Joined: Wed Oct 12, 2011 4:01 am

Re: Monitoring F5 Load balancers

Post by aap »

Unfortunately, the issue persists.

I've had 57 false Critical alerts in the last 24 hours because of service check timeouts. I am not experiencing the same problem with all other services though which leans me to believe it may be the check script itself or the device.

Is there any other thing we can check?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Monitoring F5 Load balancers

Post by tgriep »

It looks like the plugin has a default timeout of 10 seconds which might not be long enough in your environment.
To change this to a longer timeout, edit the command in Core Configuration Manager and add the following to it.

Code: Select all

-t 60
That will increase the timeout and hopefully that will fix the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Monitoring F5 Load balancers

Post by rkennedy »

Is the load high on the F5 device by any chance? I wonder if it's just getting overloaded. Are you using this check f5 plugin for any other load balancers? Just trying to narrow it down to see if we can figure out the root cause.

What value are you using currently for the timeout? I've seen SNMP take 2-3 mins before to respond. I wonder if it would work if you increased it to -t 120 or -t 180 as it sounds isolated to this device.
Former Nagios Employee
aap
Posts: 180
Joined: Wed Oct 12, 2011 4:01 am

Re: Monitoring F5 Load balancers

Post by aap »

Unfortunately, the highest timeout on the script is 60s. See output below

COMMAND: /usr/local/nagios/libexec/check_f5.pl -H 10.xxx.xxx.xxx -m health -t 180 -C public
OUTPUT: SNMP Error: The timeout value 180 is out of range (1..60)
Locked