Page 1 of 1

SNMP Timeout Issues and Unknown Storage after Upgrade to 5.7

Posted: Mon Jul 27, 2020 7:47 am
Hi there since upgrading Nagios to 5.7.2 I am seeing an increase in failures or unknowns on several SNMP checks.

On several storage disks we are now seeing the following error changing with the drive letter.

Unknown storage : ^J: : ERROR

I am also seeing "ERROR: General time-out (Alarm signal)" and "ERROR: Alarm signal (Nagios time-out)"

These checks were operating correctly before the upgrade any help would be appreciated

Re: SNMP Timeout Issues and Unknown Storage after Upgrade to

Posted: Mon Jul 27, 2020 2:11 pm
by benjaminsmith
Hi,

Often general or intermittent timeout errors are related to network congestion. Most plugins will have a timeout option ( -t ) to increase the time allowed beyond the default.

Try running the failing checks directly from the terminal and increase the timeout to see if that resolves the error. If so, you can adjust the check command in the CCCM (Core Configuration Manager) accordingly.

For instructions on how to test directly from the command line, please see the following KB article:

Nagios XI - How To Test Check Commands From The Command-line

Let me know if that resolves the issue for you.

Re: SNMP Timeout Issues and Unknown Storage after Upgrade to

Posted: Tue Jul 28, 2020 3:00 am
Hi,

I have changed the timeout value globally to be 60s and this is still not resolving the issue, I have ran the following command from a putty terminal session and it returns the correct information.

Command ran (community string and ip address changed) ;

/usr/local/nagios/libexec/check_snmp_storage.pl -H xxx.xxx.xxx.xxx -C 'communitystring' --v2c -m ^D: -w 85 -c 90 -f -t 60

Returned;

D:\ Label: Serial Number 643b0d6e: 21%used(21144MB/102398MB) (<85%) : OK | 'D:\_Label:__Serial_Number_643b0d6e'=21144MB;87038;92158;0;102398

Re: SNMP Timeout Issues and Unknown Storage after Upgrade to

Posted: Tue Jul 28, 2020 2:19 pm
by benjaminsmith
Hi

Thanks for testing that out, some good data here. So the check command is working from the CLI but not the GUI. A few items to check out to help narrow this down.

1. When the timeout was globally set to 60, was this done at the command line in the CCM or the nagios.cfg file? If it was done in the nagios.cfg, try updating the check_command with the -t 60 option.

2. In the CCM are you using the IP address or the hostname? If that the later, try changing this to the IP address in the event it's not able to resolve the domain name.

3. Lastly, since you are passing a regular expression ( ^D) in the command string, in your $ARGn$ field where you are passing this, try wrapping this field in single quotes. The ^ character could be causing an issue.

If you are not able to resolve it, can you send me you system profile and the exact name of the service and I can review the configurations? Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.