SNMP Service Check Timeout
-
christiandunn1
- Posts: 18
- Joined: Wed Mar 23, 2011 6:32 pm
- Location: Edmonton
SNMP Service Check Timeout
Linux Distribution and version? RHEL 7.2
32 or 64bit? 64-bit
VMware Image or Manual Install of XI? Manual
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL? We are using SSL.
Hi,
Ever since we performed an update in Fall 2016 we've been having issues with snmp Service checks timing out. We have many other active snmp checks working fine on the same targets (disk space, extend commands, etc.). This issue seems to have been encountered in another thread as well with the same time-frame and symptoms (https://support.nagios.com/forum/viewto ... 16&t=40520)
We are seeing about 200 instances of this timeout per hour and it appears to hit each of our roughly 200 hosts.
While the above thread mentioned network issues it should be noted that this issue happens even within the same blade chassis and network from the Nagios server to target and we aren't seeing any saturation.
This is happening on both Windows and Linux servers using the below checks:
Windows:
$USER1$/check_snmp_win.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BES Client'
Linux:
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BESClient' -w '0' -c '0'
I've attached a screenshot of the errors. They don't seem to remain for more than a minute at a time.
32 or 64bit? 64-bit
VMware Image or Manual Install of XI? Manual
Are there special configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL? We are using SSL.
Hi,
Ever since we performed an update in Fall 2016 we've been having issues with snmp Service checks timing out. We have many other active snmp checks working fine on the same targets (disk space, extend commands, etc.). This issue seems to have been encountered in another thread as well with the same time-frame and symptoms (https://support.nagios.com/forum/viewto ... 16&t=40520)
We are seeing about 200 instances of this timeout per hour and it appears to hit each of our roughly 200 hosts.
While the above thread mentioned network issues it should be noted that this issue happens even within the same blade chassis and network from the Nagios server to target and we aren't seeing any saturation.
This is happening on both Windows and Linux servers using the below checks:
Windows:
$USER1$/check_snmp_win.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BES Client'
Linux:
$USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ -C xxxx --v2c -n 'BESClient' -w '0' -c '0'
I've attached a screenshot of the errors. They don't seem to remain for more than a minute at a time.
You do not have the required permissions to view the files attached to this post.
Re: SNMP Service Check Timeout
Does your SNMP community name have special characters in it?
What is the output of these commands:
- Make sure to change YOURDEVICE to the IP or DNS name of the remote hosts
- Make sure to change YOURCOMMUNITY to your SNMP community
What is the output of these commands:
- Make sure to change YOURDEVICE to the IP or DNS name of the remote hosts
- Make sure to change YOURCOMMUNITY to your SNMP community
Code: Select all
nmap -sU -p161 YOURDEVICE
snmpwalk -v 2c -c 'YOURCOMMUNITY' YOURDEVICE:161
-
christiandunn1
- Posts: 18
- Joined: Wed Mar 23, 2011 6:32 pm
- Location: Edmonton
Re: SNMP Service Check Timeout
The community string does not have any special characters.
[root@prdmon1 ~]# nmap -sU -p161 prdsdb1
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-31 15:02 MST
Nmap scan report for prdsdb1 (199.214.10.68)
Host is up (0.00096s latency).
rDNS record for 199.214.10.68: tstsanrep.agric.gov.ab.ca
PORT STATE SERVICE
161/udp open|filtered snmp
Nmap done: 1 IP address (1 host up) scanned in 0.27 seconds
I've attached the results of second command.
[root@prdmon1 ~]# nmap -sU -p161 prdsdb1
Starting Nmap 6.47 ( http://nmap.org ) at 2017-01-31 15:02 MST
Nmap scan report for prdsdb1 (199.214.10.68)
Host is up (0.00096s latency).
rDNS record for 199.214.10.68: tstsanrep.agric.gov.ab.ca
PORT STATE SERVICE
161/udp open|filtered snmp
Nmap done: 1 IP address (1 host up) scanned in 0.27 seconds
I've attached the results of second command.
You do not have the required permissions to view the files attached to this post.
Re: SNMP Service Check Timeout
Code: Select all
HOST-RESOURCES-MIB::hrSWRunName.1456 = STRING: "BESClient.exe"
Former Nagios Employee
-
christiandunn1
- Posts: 18
- Joined: Wed Mar 23, 2011 6:32 pm
- Location: Edmonton
Re: SNMP Service Check Timeout
I don't believe adding the .exe will work as the check seems to require the service name as opposed to the executable as shown below. The current check does work however intermittently.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client.exe'
No services matching "BES Client.exe" found : CRITICAL
This was interesting. I ran the current check 5 times in under 15 seconds and got 2 timeouts which lasted 5 seconds each. This behavior is repeatable across multiple servers and platforms.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
I understand how this can seem like a network issue but it just seems strange that it began overnight on all hosts, both windows and linux, during the same time-frame as another user here.
We had ran this same check for years without ever seeing this particular Nagios time-out error. I've attached a screenshot showing roughly 30 occurrences in 20 minutes affecting hosts both virtual and physical.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client.exe'
No services matching "BES Client.exe" found : CRITICAL
This was interesting. I ran the current check 5 times in under 15 seconds and got 2 timeouts which lasted 5 seconds each. This behavior is repeatable across multiple servers and platforms.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
I understand how this can seem like a network issue but it just seems strange that it began overnight on all hosts, both windows and linux, during the same time-frame as another user here.
We had ran this same check for years without ever seeing this particular Nagios time-out error. I've attached a screenshot showing roughly 30 occurrences in 20 minutes affecting hosts both virtual and physical.
You do not have the required permissions to view the files attached to this post.
Re: SNMP Service Check Timeout
Code: Select all
[root@centos7 libexec]# ./check_snmp_win.pl
Put snmp login info!
Usage: check_snmp_win [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd) [-p <port>] -n <name>[,<name2] [-T=service] [-r] [-s] [-N=<n>] [-t <timeout>] [-V]
Former Nagios Employee
-
christiandunn1
- Posts: 18
- Joined: Wed Mar 23, 2011 6:32 pm
- Location: Edmonton
Re: SNMP Service Check Timeout
When setting the -t to 60 I get the same behavior except it takes 60 seconds to error instead of the previous 5 seconds.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
ERROR: Alarm signal (Nagios time-out)
While the above command was timing out I was able to successfully run the same check with multiple successes in a second session to the Nagios server.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client' -t 60
ERROR: Alarm signal (Nagios time-out)
While the above command was timing out I was able to successfully run the same check with multiple successes in a second session to the Nagios server.
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
ERROR: Alarm signal (Nagios time-out)
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
[root@prdmon1 libexec]# ./check_snmp_win.pl -H prdsdb1 -C xxxx -n 'BES Client'
1 services active (matching "BES Client") : OK
Re: SNMP Service Check Timeout
Odd inconsistencies. Can you try running strace in front of the commands? It'll walk us through the process on what's occuring and provide a bit of debugging. We'll need to see one in a working state, and one with a failing state. (usually judged by an echo towards the end of it all - might need to install strace)
Former Nagios Employee
-
christiandunn1
- Posts: 18
- Joined: Wed Mar 23, 2011 6:32 pm
- Location: Edmonton
Re: SNMP Service Check Timeout
Ok I've run the commands with strace and have uploaded a success and a failure.
You do not have the required permissions to view the files attached to this post.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: SNMP Service Check Timeout
The plugin you are using is actually quite simple and sends a standard SNMP query. This is almost certainly caused by some type of networking issues or the SNMP host simply not responding in time. The strace is pretty clear the plugin is simply timing out.
Do you consistently see the packets arrive at the destination? And what of their route back?
Do you consistently see the packets arrive at the destination? And what of their route back?
Previous Nagios employee