NETAPP + SNMP TIMEOUTS + UNNECESSARY NOTIFICATIONS

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
ebardellidoxee
Posts: 6
Joined: Wed Jul 10, 2013 6:00 am

[RESOLVED] NETAPP + SNMP TIMEOUTS + UNNECESSARY NOTIFICATION

Post by ebardellidoxee »

After weeks and a ton of workarounds experimented, I've finally got to a fair stable solution and I would love share it as it could help someone else

1 ) As predicted, Netapp’s support didn't help that much.
For them, the SNMP has a very low priority and SNMP requests are not elaborated when the storage is busy doing something else and it’s normal that SNMP packets are dropped…
(eventually)The only good hint provided is to query the SNMP on netapp's e0M nic that is dedicated to the management traffic and should be less busy.

2 ) Originally, I was using the check_naf.py plugin, written in phyton by team(ix) .
Even though this is a really cool plugin with lots of features and I am still using it for monitoring filers’ OPS, I/O... ; I decided to switch to a different plugin to monitor volume usage.
I tried a different plugin check-netapp-ng.pl and worked to refine and tune it for 2 main reasons: It’s written in perl (and I am definitely more confident with perl), it uses the perl module Net::SNMP (rather than linux tools snmpget and snmpwalk as used by the check_naf.py) and I found it a little more reliable. (Not real strong evidences, though. Just a better feeling)
However, check-netapp-ng.pl isn’t as advanced and engineered as check_naf.py and I needed to write few lines and fix a couple of bugs to replicate check-netapp-ng.pl functionalities.

3 ) As suggested in the posts above, the ultimate way to fix this problem is to work on timeouts.
I identified and tuned 3 different timeouts:

Nagios service check timeout (nagios.cfg) --- > service_check_timeout=240 --- > oringinally 30 sec
Perl plugins timeout (utils.pm)--- > $TIMEOUT = 180; --- > originally 15 sec
Net::SNMP timeout (check-netapp-ng.pl) --- > my ($sess, $err) = Net::SNMP->session( -hostname => $server, -version => $version, -community => $comm, -timeout=> 60); [60 sec is the max] --- > originally 5 sec

I imagined these values as a Russian Matryoshka Doll where the Net::SNMP timeout is the most inner layer and the service timeout is the outer one and set values in a way that lets all the layers enough time to elaborate.

Hope this might help!
Thanks for your support

Emanuele
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: NETAPP + SNMP TIMEOUTS + UNNECESSARY NOTIFICATIONS

Post by abrist »

Thanks for the elaborate update! If you get a chance, if may be helpful to create a diff/patch and post it on the exchange page for the plugins in question. Who knows, your additions may get rolled up into the respective plugin's next releases.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked