Page 1 of 1
NetApp - SNMP monitoring issue
Posted: Mon Aug 05, 2019 11:49 am
by NMFSTeam
Not sure if there is anything that can be done about this, but over the last few days, we started receiving alerts from Nagios regarding our NetApp, specifically, the shelf status. We are using a default SNMP check, and all the other checks are working fine (cpu, autosupport status, disks, shelf info, etc.). The only one that is alerting us is "shelf status." We are receiving the error: (No output on stdout) stderr:
Normally, after ten minutes, it reverts back to OK. Then, later in the day, we will receive another alert, only for it to clear itself after ten minutes.
We have checked the NetApp filer, and there does not appear to be any issues with it. Could it just be network issues? Thanks.
Re: NetApp - SNMP monitoring issue
Posted: Mon Aug 05, 2019 1:34 pm
by ssax
It depends on what plugin you are using, please SSH into the XI server and run the check command from the CLI as the nagios user:
Code: Select all
su - nagios
/usr/local/nagios/libexec/YOURFULLCHECKCOMMAND -with -arguments
Then send us the entire output.
If you need help deciphering what that check command is, please PM me a copy of your profile, you can download it from
Admin > System Profile > Download Profile.
If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh SUPPORT
Then send me the resulting
/usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
Re: NetApp - SNMP monitoring issue
Posted: Mon Aug 05, 2019 6:21 pm
by NMFSTeam
I am unable to determine the check that is being used. I have sent a PM with the profile. Thank you.
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 06, 2019 10:30 am
by NMFSTeam
Using the Core Config Manager, I was able to determine the check being used. (of course now, it's working fine)
Code: Select all
[nagios@nagios01 libexec]$ ./check-netapp-ng.pl -H 192.168.0.25 -C snmpstring -T SHELF
VoltOverFail->None VoltUnderFail->None TempUnderFail->None PsFail->None TempOver->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None FanFail->None TempUnderWarn->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
OK: SHELF ok | shelf=0
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 06, 2019 3:31 pm
by ssax
Please attach this file:
Code: Select all
/usr/local/nagios/libexec/check-netapp-ng.pl
More than likely you need to set a timeout on it (or increase some other timeout along the path), I'll investigate that while you send me the file so that I can look at that specific version.
But generally, if load gets high on a system SNMP data is the first thing to get dropped (or gets lower priority) so adjusting your max_check_attempts to account for these situations can help you alleviate that as an issue but it usually just takes increasing a timeout somewhere (SNMP can take a bit to respond if load is high on a system).
Additionally, what is the output of this command?
Code: Select all
time /usr/local/nagios/libexec/check-netapp-ng.pl -H 192.168.0.25 -C snmpstring -T SHELF
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 06, 2019 7:00 pm
by NMFSTeam
Files are being sent via PM now. I actually found two files, perhaps the other one would work better?
Here is the output of the command:
Code: Select all
[root@nagios01 nagios]# time /usr/local/nagios/libexec/check-netapp-ng.pl -H 192.168.0.25 -C snmpstring -T SHELF
VoltOverFail->None VoltUnderFail->None TempUnderFail->None PsFail->None TempOver->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None FanFail->None TempUnderWarn->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
OK: SHELF ok | shelf=0
real 0m8.020s
user 0m0.180s
sys 0m0.017s
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 06, 2019 7:09 pm
by NMFSTeam
The issue is happening NOW, so I went ahead and ran the commands again...
Code: Select all
[root@nagios01 nagios]# /usr/local/nagios/libexec/check-netapp-ng.pl -H 192.168.0.25 -C snmpstring -T SHELF
VoltOverFail->None VoltUnderFail->None TempUnderFail->None PsFail->None TempOver->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None FanFail->None TempUnderWarn->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
Alarm clock
[root@nagios01 nagios]# time /usr/local/nagios/libexec/check-netapp-ng.pl -H 192.168.0.25 -C snmpstring -T SHELF
VoltOverFail->None VoltUnderFail->None TempUnderFail->None PsFail->None TempOver->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None FanFail->None TempUnderWarn->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
VoltOverFail->None VoltUnderFail->None TempUnderFail->None TempOver->None PsFail->None ElectFail->None VoltUnderWarn->None VoltOverWarn->None TempUnderWarn->None FanFail->None TempOverFail->None
Alarm clock
real 0m15.066s
user 0m0.159s
sys 0m0.011s
Re: NetApp - SNMP monitoring issue
Posted: Wed Aug 07, 2019 5:08 pm
by ssax
Please add this to the top of the script (after the 1st line):
Then see if that resolves your issue related to this.
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 13, 2019 10:25 am
by NMFSTeam
I implemented the change you suggested and that seems to have fixed things. No more errors from Nagios. Thank you very much for your assistance.
Re: NetApp - SNMP monitoring issue
Posted: Tue Aug 13, 2019 4:36 pm
by mbellerue
Glad to hear it's working! Closing thread.