Thanks for your help.
see attached
Monitoring F5 Load balancers
Re: Monitoring F5 Load balancers
You do not have the required permissions to view the files attached to this post.
Re: Monitoring F5 Load balancers
Can you login to the XI server as root, run the following commands and post the output?
Replace <Communityname> with the settings from your device.
If the above works, you will need to add the community name to your service check.
Code: Select all
/usr/local/nagios/libexec/check_f5.pl -H 10.127.220.150 -m fan -t 30 -C <Communityname>
/usr/local/nagios/libexec/check_f5.pl --helpIf the above works, you will need to add the community name to your service check.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Monitoring F5 Load balancers
See both results below
Average fan speed 10546 rpm.|fan_speed=10546;;;0;
Usage: /usr/local/nagios/libexec/check_f5.pl -H host [-C community] [-v][-d][-h][-M][-t timout]
[--no-perf] [--no-optimise]
[-f cachefile] [-x cacheexpiretime]
[-m modules]
-H --host : Specify F5 hostname (f5)
Can be used multiple times; the first to answer is used.
-C --community : Specify SNMP community string (public)
-d --debug : Debug mode. Use multiple times for more detail
-h --help : Show help
-M --mrtg : MRTG mode. Also --no-mrtg for Nagios mode.
-t --timeout : SNMP query timeout (10)
-f --cache-file : File basename for SNMP cache (/tmp/f5.cache)
-x --cache-expire : Seconds before cache becomes invalid (290)
--no-perf : Disable perfstats in Nagios output
--no-optimise : Retrieve entire SNMP tree for tables (use in conjunction
with cache if many separate server() checks being done)
-m --modules : List modules to enable. Space separated. Can be used
multiple times if required. See below.
-v --verbose : verbose logging
MRTG mode
In MRTG mode, only the first module to provide a metric will be output.
Available modules:
cpu[:n][(warnpc,critpc)] MRTG: user and idle percent (80,90)
mem[ory][:percent][(warnpc,critpc)] MRTG: used and total bytes (or %) (80,90)
temp[erature][(warn,crit)]
fan No MRTG output
psu No MRTG output
health Same as 'cpu mem temp fan psu'
ssl[:server][(activewarn,activecrit)] MRTG: active and total SSL (1400,1000)
traffic[:server] MRTG: bytes in/out. No Nagios.
server:name[(actvw,actvc)] For virt server name (1400,1000)
server:ipaddr[:port][(actvw,actvc)] For virt server ipaddr:port
server[(actvw,actvc)] Over ALL virtual servers (140000,100000)
cert[ificate][:certname][([warn,]crit)] Check certificate days left, no MRTG
conn[ections][(actvw,actvc)] same as 'server'
group Failover health. No MRTG output.
cm Same as 'group'
cache No output; prepare SNMP cache
Server checks:
For server checks, the number of active connections will be thresholded.
Next, the availability of the Server will be checked and will return WARN if
not all the active pool members are available.
If the same module is used multiple times, only the first one will be used.
Examples:
/usr/local/nagios/libexec/check_f5.pl -H myf5 -C public -m 'cpu(80,90) server:/Production/foobar'
/usr/local/nagios/libexec/check_f5.pl -H myf5 -m 'health certificates(14,7) conn(1500,2000)'
Average fan speed 10546 rpm.|fan_speed=10546;;;0;
Usage: /usr/local/nagios/libexec/check_f5.pl -H host [-C community] [-v][-d][-h][-M][-t timout]
[--no-perf] [--no-optimise]
[-f cachefile] [-x cacheexpiretime]
[-m modules]
-H --host : Specify F5 hostname (f5)
Can be used multiple times; the first to answer is used.
-C --community : Specify SNMP community string (public)
-d --debug : Debug mode. Use multiple times for more detail
-h --help : Show help
-M --mrtg : MRTG mode. Also --no-mrtg for Nagios mode.
-t --timeout : SNMP query timeout (10)
-f --cache-file : File basename for SNMP cache (/tmp/f5.cache)
-x --cache-expire : Seconds before cache becomes invalid (290)
--no-perf : Disable perfstats in Nagios output
--no-optimise : Retrieve entire SNMP tree for tables (use in conjunction
with cache if many separate server() checks being done)
-m --modules : List modules to enable. Space separated. Can be used
multiple times if required. See below.
-v --verbose : verbose logging
MRTG mode
In MRTG mode, only the first module to provide a metric will be output.
Available modules:
cpu[:n][(warnpc,critpc)] MRTG: user and idle percent (80,90)
mem[ory][:percent][(warnpc,critpc)] MRTG: used and total bytes (or %) (80,90)
temp[erature][(warn,crit)]
fan No MRTG output
psu No MRTG output
health Same as 'cpu mem temp fan psu'
ssl[:server][(activewarn,activecrit)] MRTG: active and total SSL (1400,1000)
traffic[:server] MRTG: bytes in/out. No Nagios.
server:name[(actvw,actvc)] For virt server name (1400,1000)
server:ipaddr[:port][(actvw,actvc)] For virt server ipaddr:port
server[(actvw,actvc)] Over ALL virtual servers (140000,100000)
cert[ificate][:certname][([warn,]crit)] Check certificate days left, no MRTG
conn[ections][(actvw,actvc)] same as 'server'
group Failover health. No MRTG output.
cm Same as 'group'
cache No output; prepare SNMP cache
Server checks:
For server checks, the number of active connections will be thresholded.
Next, the availability of the Server will be checked and will return WARN if
not all the active pool members are available.
If the same module is used multiple times, only the first one will be used.
Examples:
/usr/local/nagios/libexec/check_f5.pl -H myf5 -C public -m 'cpu(80,90) server:/Production/foobar'
/usr/local/nagios/libexec/check_f5.pl -H myf5 -m 'health certificates(14,7) conn(1500,2000)'
Re: Monitoring F5 Load balancers
You heed to edit the services for the fxtplbdevp1 host and add the missing option
for all of the SNMP checks and that should resolve the timeouts you are seeing.
Code: Select all
-C <community>Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Monitoring F5 Load balancers
Done.
I will monitor and confirm. Thanks for looking into it.
I will monitor and confirm. Thanks for looking into it.
Re: Monitoring F5 Load balancers
Your Welcome. Glad to help.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Monitoring F5 Load balancers
Unfortunately, the issue persists.
I've had 57 false Critical alerts in the last 24 hours because of service check timeouts. I am not experiencing the same problem with all other services though which leans me to believe it may be the check script itself or the device.
Is there any other thing we can check?
I've had 57 false Critical alerts in the last 24 hours because of service check timeouts. I am not experiencing the same problem with all other services though which leans me to believe it may be the check script itself or the device.
Is there any other thing we can check?
Re: Monitoring F5 Load balancers
It looks like the plugin has a default timeout of 10 seconds which might not be long enough in your environment.
To change this to a longer timeout, edit the command in Core Configuration Manager and add the following to it.
That will increase the timeout and hopefully that will fix the issue.
To change this to a longer timeout, edit the command in Core Configuration Manager and add the following to it.
Code: Select all
-t 60Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Monitoring F5 Load balancers
Is the load high on the F5 device by any chance? I wonder if it's just getting overloaded. Are you using this check f5 plugin for any other load balancers? Just trying to narrow it down to see if we can figure out the root cause.
What value are you using currently for the timeout? I've seen SNMP take 2-3 mins before to respond. I wonder if it would work if you increased it to -t 120 or -t 180 as it sounds isolated to this device.
What value are you using currently for the timeout? I've seen SNMP take 2-3 mins before to respond. I wonder if it would work if you increased it to -t 120 or -t 180 as it sounds isolated to this device.
Former Nagios Employee
Re: Monitoring F5 Load balancers
Unfortunately, the highest timeout on the script is 60s. See output below
COMMAND: /usr/local/nagios/libexec/check_f5.pl -H 10.xxx.xxx.xxx -m health -t 180 -C public
OUTPUT: SNMP Error: The timeout value 180 is out of range (1..60)
COMMAND: /usr/local/nagios/libexec/check_f5.pl -H 10.xxx.xxx.xxx -m health -t 180 -C public
OUTPUT: SNMP Error: The timeout value 180 is out of range (1..60)