SNMP Service Check Timeout

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
christiandunn1
Posts: 18
Joined: Wed Mar 23, 2011 6:32 pm
Location: Edmonton

SNMP Service Check Timeout

Post by christiandunn1 »

Linux Distribution and version? RHEL 7.2
32 or 64bit? 64-bit
VMware Image or Manual Install of XI? Manual
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL? We are using SSL.

Hi,

Ever since we performed an update in Fall 2016 we've been having issues with snmp Service checks timing out. We have many other active snmp checks working fine on the same targets (disk space, extend commands, etc.). This issue seems to have been encountered in another thread as well with the same time-frame and symptoms (https://support.nagios.com/forum/viewto ... 16&t=40520)

We are seeing about 200 instances of this timeout per hour and it appears to hit each of our roughly 200 hosts.

While the above thread mentioned network issues it should be noted that this issue happens even within the same blade chassis and network from the Nagios server to target and we aren't seeing any saturation.

This is happening on both Windows and Linux servers using the below checks:

Windows:
$USER1$/check_snmp_win.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BES Client'

Linux:
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BESClient' -w '0' -c '0'

I've attached a screenshot of the errors. They don't seem to remain for more than a minute at a time.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: SNMP Service Check Timeout

Post by ssax »

Does your SNMP community name have special characters in it?

What is the output of these commands:
- Make sure to change YOURDEVICE to the IP or DNS name of the remote hosts
- Make sure to change YOURCOMMUNITY to your SNMP community

Code: Select all

nmap -sU -p161 YOURDEVICE
snmpwalk -v 2c -c 'YOURCOMMUNITY' YOURDEVICE:161
christiandunn1
Posts: 18
Joined: Wed Mar 23, 2011 6:32 pm
Location: Edmonton

Re: SNMP Service Check Timeout

Post by christiandunn1 »

The community string does not have any special characters.

[root@prdmon1 ~]# nmap -sU -p161 prdsdb1

Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-31 15:02 MST
Nmap scan report for prdsdb1 (199.214.10.68)
Host is up (0.00096s latency).
rDNS record for 199.214.10.68: tstsanrep.agric.gov.ab.ca
PORT STATE SERVICE
161/udp open|filtered snmp

Nmap done: 1 IP address (1 host up) scanned in 0.27 seconds

I've attached the results of second command.
snmpwalk.txt
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: SNMP Service Check Timeout

Post by rkennedy »

Code: Select all

HOST-RESOURCES-MIB::hrSWRunName.1456 = STRING: "BESClient.exe"
I noticed this in the walk - what happens if you search for "BESClient.exe" - does it work properly? Are you able to execute it over the command line properly? (use the /usr/local/nagios/libexec/ directory)
Former Nagios Employee
christiandunn1
Posts: 18
Joined: Wed Mar 23, 2011 6:32 pm
Location: Edmonton

Re: SNMP Service Check Timeout

Post by christiandunn1 »

I don't believe adding the .exe will work as the check seems to require the service name as opposed to the executable as shown below. The current check does work however intermittently.

[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client.exe'
No services matching "BES Client.exe" found : CRITICAL

This was interesting. I ran the current check 5 times in under 15 seconds and got 2 timeouts which lasted 5 seconds each. This behavior is repeatable across multiple servers and platforms.

[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK

I understand how this can seem like a network issue but it just seems strange that it began overnight on all hosts, both windows and linux, during the same time-frame as another user here.

We had ran this same check for years without ever seeing this particular Nagios time-out error. I've attached a screenshot showing roughly 30 occurrences in 20 minutes affecting hosts both virtual and physical.
log.jpg
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: SNMP Service Check Timeout

Post by rkennedy »

Code: Select all

[root@centos7 libexec]# ./check_snmp_win.pl
Put snmp login info!
Usage: check_snmp_win [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd) [-p <port>] -n <name>[,<name2] [-T=service] [-r] [-s] [-N=<n>] [-t <timeout>] [-V]
What happens if you run it with a -t of 60 - are your results consistent? I've seen SNMP take up to a few minutes to respond in the past, not much to be concerned about honestly.
Former Nagios Employee
christiandunn1
Posts: 18
Joined: Wed Mar 23, 2011 6:32 pm
Location: Edmonton

Re: SNMP Service Check Timeout

Post by christiandunn1 »

When setting the -t to 60 I get the same behavior except it takes 60 seconds to error instead of the previous 5 seconds.

[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
ERROR: Alarm signal (Nagios time-out)

While the above command was timing out I was able to successfully run the same check with multiple successes in a second session to the Nagios server.

[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: SNMP Service Check Timeout

Post by rkennedy »

Odd inconsistencies. Can you try running strace in front of the commands? It'll walk us through the process on what's occuring and provide a bit of debugging. We'll need to see one in a working state, and one with a failing state. (usually judged by an echo towards the end of it all - might need to install strace)
Former Nagios Employee
christiandunn1
Posts: 18
Joined: Wed Mar 23, 2011 6:32 pm
Location: Edmonton

Re: SNMP Service Check Timeout

Post by christiandunn1 »

Ok I've run the commands with strace and have uploaded a success and a failure.
stracegood.txt
stracebad.txt
You do not have the required permissions to view the files attached to this post.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: SNMP Service Check Timeout

Post by avandemore »

The plugin you are using is actually quite simple and sends a standard SNMP query. This is almost certainly caused by some type of networking issues or the SNMP host simply not responding in time. The strace is pretty clear the plugin is simply timing out.

Do you consistently see the packets arrive at the destination? And what of their route back?
Previous Nagios employee
Locked