Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
I just retested the timeout_state branch. Works fine for me, though improperly specified community strings will cause the error as will specifying an invalid snmp protocol version:
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
I've verified that the community and protocol version are correct. The same command to faster responding sites show no error. Here are the command, one with a 3 second timeout and one with 5 second. The plugin timeout message only appears on the -t 5.
./check_snmp -H 10.0.0.1 -C XXXX -o sysUpTime.0 -e 3 -t 5 -vvv
/usr/bin/snmpget -Le -t 5 -r 3 -m ALL -v 1 [authpriv] 10.0.0.1:161 sysUpTime.0
CRITICAL - Plugin timed out while executing system call
Maybe if you run the command to an IP that isn't alive with the -t 3, -t 10 or -t 30, maybe it will produce the same result I see? The end result I'm trying to achieve is the same no response message with -t 10 as with -t 3.
You are seeing two different timeout errors. One is the generic plugin timeout, and the other is the runcmd timeout. If your retries and timeout are really close, you may see this behavior. I will look into creating a bit more room for the external command to complete.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Okay, I've still have been troubleshooting this (when time permits). I've went back to plugins version 1.4.16 and Nagios v3.5.1. This plugin version does *NOT* produce the system call timeout message with the high timeout values. Plugins Version 1.5 does. So looks likes something broke? between 1.4.16 and 1.5.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Were you able to find a solution to this yet?
I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time. Alert messages spammed the mail server, the mysql database filled up the partition and crashed Nagios, basically a huge mess that took me all day to clean up. I'd like to make sure this kind of thing won't become a common occurrence.
phobbs wrote:I ran across this problem today when a network interruption caused ~250 hosts to become unavailable and around 1300 SNMP checks to go critical at the same time.
Could you let us know how this relates to a difference in status output text? It sound like you just had a nasty network outage. The issues here with check_snmp are relating to the text output when a plugin times out, but the state should stay the same . . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.