Does anyone have any issues with SNMP Trap reliability or suggestions for improving reliability of SNMP Trap reception & translation in this situation? It's been spotty for me, as I've had the snmptrapd service get into a bad state multiple times, which prevents any traps from coming in and therefore no notifications that myself and others rely on when there are problems. So though traps are still being sent from the remote end and things for the most part appear normal, traps suddenly stop showing up in the snmptt logs on the NagiosXI server.
From my perspective the problem seems to be with snmptrapd, but i'm uncertain whether the snmptt service or the python script for submitting the passive check to Nagios can have an impact on snmptrapd's state. Both snmptrapd and snmptt services are still running. Restarting snmptt doesn't resolve the issue, but restarting snmptrapd does. I didn't think to check whether traffic was still coming in on 162 before I did a restart to fix.
The only thing that catches my eye is that snmptraphandling.py hangs around in the process table when this happens. I setup a monitor to watch for this process, but i'm not certain it's always going to be there. Anyone have any thoughts on what might be causing this?
On a separate, but semi related note that might help others: The default install of snmptrapd came with a bad stop function in /etc/init.d/snmptrapd. If I ran stop or restart (which calls a stop+start) against snmptrapd, it didn't work and this output was sent to standard error:
Stopping snmptrapd: pidof: invalid options on command line!
pidof: invalid options on command line!
The stop function calls killproc (a function imported from /etc/init.d/functions) with "-On" as an argument, which created the issue. Since killproc is executing a kill, I don't think an "O" (not zero) or a literal "n", are valid options. Maybe the O was meant to be a "0" (zero), which can be passed to kill, but that just performs error checking and doesn't actually send the signal. To fix that in my situation I removed the "-On." The snmptrapd process does accept the "-On" option, so my guess is that this was just a mistake and the flags were in the wrong spot.
RHEL 6.2 + NagiosXI 2011R3.3
Thanks!
Bryant
SNMP Trap Reliability
Re: SNMP Trap Reliability
I found that snmptt is running in standalone mode and snmptrapd waits on snmptt to finish it's work before processing additional traps. Snmptt in turn waits on the program it execs to return before it continues, so it very well could be that snmptraphandling.py hangs and clogs up the works.
From http://snmptt.sourceforge.net/docs/snmp ... ile-format...
Standalone or daemon mode:
The SNMPTRAPD program blocks when executing traphandle commands. This means that if the program called never quits, SNMPTRAPD will wait forever. If a trap is received while the traphandler is running, it is buffered and will be processed when the traphandler finishes. I do not know how large this buffer is.
The program called by SNMPTT (EXEC) blocks SNMPTT. If you call a program that does not return, SNMPTT will be left waiting. In standalone mode, this would cause snmptrapd to wait forever also.
From http://snmptt.sourceforge.net/docs/snmp ... ile-format...
Standalone or daemon mode:
The SNMPTRAPD program blocks when executing traphandle commands. This means that if the program called never quits, SNMPTRAPD will wait forever. If a trap is received while the traphandler is running, it is buffered and will be processed when the traphandler finishes. I do not know how large this buffer is.
The program called by SNMPTT (EXEC) blocks SNMPTT. If you call a program that does not return, SNMPTT will be left waiting. In standalone mode, this would cause snmptrapd to wait forever also.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: SNMP Trap Reliability
I am going to lock this thread, please continue discussion on
http://support.nagios.com/forum/viewtop ... =16&t=7464
http://support.nagios.com/forum/viewtop ... =16&t=7464