Page 1 of 3

Monitoring F5 Load balancers

Posted: Fri Jul 08, 2016 8:56 am
by aap
Hi,

Does anyone have a good procedure and plugin to monitor F5 load balancers on Nagios XI?

I have tried most of the plugins on Nagios Exchange without any luck.

Any help is appreciated.

Re: Monitoring F5 Load balancers

Posted: Fri Jul 08, 2016 10:18 am
by rkennedy
We do not have any F5's to test with in house, and my first instinct is going to point you to the Exchange. (https://exchange.nagios.org/index.php?o ... rchword=f5)

If none of these will work, then the next step is looking into what could be created.

With that said though, there may be custom ways to monitor it. What about it are you trying to monitor exactly? Do they have any kind of API available for pulling information? Any additional information you can provide will help as well to try and figure out what the best course is.

Re: Monitoring F5 Load balancers

Posted: Tue Jul 12, 2016 7:21 am
by aap
Thanks for your reply

I have successfully implemented - https://exchange.nagios.org/directory/P ... f5/details however, the checks intermittently come up with Error: Cannot read Server Statistics table

Anybody had this same experience?

I am trying to monitor general health of the device - mem, cpu, fan, active connections, traffic, psu etc.

Re: Monitoring F5 Load balancers

Posted: Tue Jul 12, 2016 1:41 pm
by rkennedy
Usually intermittent results over SNMP are the case of SNMP taking a while to respond. It looks like the default timeout is only 10 seconds, which is pretty low for SNMP. Try appending -t 30 to the check, so that SNMP has a bit more time to return the values.

Let us know how that goes!

Re: Monitoring F5 Load balancers

Posted: Thu Jul 14, 2016 1:50 am
by aap
Thanks for the suggestion.

However it does not make much difference. A few of the checks intermittently come back with Service check timed out after 60.01 seconds (maximum timeout is 60s) changing the state of the service to CRITICAL. I may increase the check period but that would only mean it takes longer for it to check again after a timeout.

At present, there are quite a lot of false alerts due to this and I'm wondering if I can really run this in Production. With this sort of problem, is the issue on the Nagios XI server or the device being monitored?

Any assistance is appreciated.

Re: Monitoring F5 Load balancers

Posted: Thu Jul 14, 2016 9:25 am
by rkennedy
It's hard to say off the bat, but usually the client machine is to blame. There is a possibility you're seeing performance issues on the XI machine though. Can you attach a screenshot of your Admin -> System Status, and Admin -> Monitoring Engine Status pages for us to look at?

Then, PM over a profile for us as well. (Admin -> System Profile -> Download Profile)

In the past, I've seen that if you have failing SNMP connections for whatever reason, it'll hold the entire connection open, which then keeps the perl scripts open, and in turn raises your load up quite a bit.

EDIT: Profile received.

Re: Monitoring F5 Load balancers

Posted: Thu Jul 14, 2016 9:35 am
by aap
Thanks.

Screenshot attached. I'll PM the profile.

Re: Monitoring F5 Load balancers

Posted: Thu Jul 14, 2016 2:30 pm
by rkennedy
Got the profile, and still looking through it currently.

Can you also upload the Admin -> Monitoring Engine Status page? I'd like to see the statistics pertaining to the engine.

From what I can tell, some of your checks are timing out on other hosts as well, and Core workers are being killed.

Code: Select all

Jul 14 15:30:35 20prdv nagios: wproc: Core Worker 30919: job 214 (pid=32616) timed out. Killing it
Jul 14 15:30:35 20prdv nagios: wproc: CHECK job 214 from worker Core Worker 30919 timed out after 60.01s
Jul 14 15:30:35 20prdv nagios: wproc:   host=x; service=Fan Status;
Jul 14 15:30:35 20prdv nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jul 14 15:30:35 20prdv nagios: Warning: Check of service 'Fan Status' on host 'x' timed out after 60.006s!
Jul 14 15:30:35 20prdv nagios: SERVICE ALERT: x;Fan Status;CRITICAL;SOFT;1;(Service check timed out after 60.01 seconds)
Jul 14 15:30:35 20prdv nagios: wproc: Core Worker 30919: job 214 (pid=32616): Dormant child reaped

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 4:49 am
by aap
Urgent! Please check DM

Re: Monitoring F5 Load balancers

Posted: Fri Jul 15, 2016 9:47 am
by rkennedy
Responded, please post the screenshot requested.