snmptt stops processing traps after Nagios restarts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
eclypse
Posts: 50
Joined: Thu Dec 01, 2011 4:55 pm

snmptt stops processing traps after Nagios restarts

Post by eclypse »

I've sent this into the support email, but figured I would post here in the event other customers have seen this behavior:

I installed the SNMP Trap handling as per this guide back on 11/16. Everything was working fine and traps were being received and translated to Nagios alerts, however since then, I am experiencing random crashes every few days with snmptt. When this happens, alerts aren’t created for any traps that are received (and /var/log/snmptt/snmptt.log shows no new events) . Upon restarting snmptt, the processing seems to pick back up where it left off, but this is less than ideal. Further analysis shows that this coincides during Nagios restarts typically when I'm applying a configuration change, but it doesn't happen on every restart.

Preliminary troubleshooting suggests that snmptt hangs when it tries to call snmptraphandling.py when the Nagios pipe (/usr/local/nagios/var/rw/nagios.cmd) isn't there (due to the Nagios restart).

I received an updated version of snmptraphandling.py from support which wraps a try-except clause around the opening of the Nagios pipe, but this has not solved the issue thus far.

Code: Select all

<         try:
<             output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
<             results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
<                 + host + ";" + service + ";" \
<                 + return_code + ";" + mondata_res + "\n"
<             output.write(results)
<         except Exception, e:
<             logger.error(e)
---
>     output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
>     results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
>         + host + ";" + service + ";" \
>         + return_code + ";" + mondata_res + "\n"
>     output.write(results)
For now, I'm forced to monitor my snmptt logs and alert if no events have been processed in the past 15 minutes, and restart snmptt. A typical restart looks like this.

Code: Select all

[root@nagiosxi libexec]# /etc/init.d/snmptt restart
Stopping snmptt:                                           [FAILED]
Starting snmptt: PID file: /var/run/snmptt.pid
                                                           [  OK  ]
Here's my version information:

Code: Select all


Nagios XI Version : 2012R1.5
nagiosxi-01-blr 2.6.32-220.17.1.el6.i686 i686
CentOS release 6.2 (Final)
Gnome is not installed

# rpm -qa | grep snmptt
snmptt-1.3-3.nagios.noarch
 
# cat /etc/sysconfig/snmptrapd
# snmptrapd command line options
# OPTIONS="-Lsd -p /var/run/snmptrapd.pid"
OPTIONS="-Lsd -On -p /var/run/snmptrapd.pid"
 
# cat /etc/snmp/snmptrapd.conf
disableAuthorization yes
traphandle default /usr/local/sbin/snmptthandler
 
# ps aux | grep snmptt
root     10382  0.0  0.0   4352   744 pts/1    S+   15:55   0:00 grep snmptt
root     30873  0.0  0.0  16424  8040 ?        Ss   Dec06   0:01 /usr/bin/perl /usr/local/sbin/snmptt --daemon
root     30875  0.0  0.1  16480  9080 ?        Ss   Dec06   0:29 /usr/bin/perl /usr/local/sbin/snmptt –daemon
 
# ps aux | grep snmptrapd
root      5617  0.0  0.0  20156  5804 ?        Ss   Dec05   0:17 /usr/sbin/snmptrapd -Lsd -On -p /var/run/snmptrapd.pid
root     11186  0.0  0.0   4356   756 pts/1    S+   15:56   0:00 grep snmptrapd
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: snmptt stops processing traps after Nagios restarts

Post by abrist »

Try stopping snmptt, killing all snmptt processes, and starting snmptt once again:

Code: Select all

service stop snmptt
killall snmptt
ps -aef|grep snmptt
service snmptt start
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
eclypse
Posts: 50
Joined: Thu Dec 01, 2011 4:55 pm

Re: snmptt stops processing traps after Nagios restarts

Post by eclypse »

abrist wrote:Try stopping snmptt, killing all snmptt processes, and starting snmptt once again:

Code: Select all

service stop snmptt
killall snmptt
ps -aef|grep snmptt
service snmptt start
I have done this, but the process will still crash. I think I'm hitting a race condition where a trap is being received or processed during the restart of Nagios. We have about 15 or so ESX servers that we receive SNMP traps from. These send hearbeat traps about once every 5 minutes, so on average we see at least one trap per minute coming into Nagios. This high rate of traps might be contributing, but I'm hoping there's a solution that doesn't involve removing or reducing the heartbeats.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: snmptt stops processing traps after Nagios restarts

Post by abrist »

Just out of curiosity for testings sake, if you reduce the heartbeat, is the problem mitigated?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
ahmad.zuhd
Posts: 44
Joined: Sun Jul 01, 2012 2:33 am

Re: snmptt stops processing traps after Nagios restarts

Post by ahmad.zuhd »

i have the same problem... and support replies the same as you ...

what i'm trying to do now is to restart the snmptt whenever configuration applied...

what i know is that /usr/local/nagiosxi/html/includes/components/ccm/includes/applyconfig.inc.php takes care of applying the configuration. so, i inserted the following command:

Code: Select all

exec("/usr/bin/sudo /etc/init.d/snmptt restart");
knowing that i already allow apache user to restart snmptt:

Code: Select all

# grep snmptt /etc/sudoers
NAGIOSXI ALL = NOPASSWD:/etc/init.d/snmptt restart
NAGIOSXIWEB ALL = NOPASSWD:/etc/init.d/snmptt restart
however, the snmptt is not restarting...

any idea how i can see the logs (results) of executing the /usr/local/nagiosxi/html/includes/components/ccm/includes/applyconfig.inc.php when i hit apply configuration?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: snmptt stops processing traps after Nagios restarts

Post by slansing »

It appears you have a ticket open for this problem already, the last four numbers of this ticket are: 0017.

We're going to lock this thread and post it as a note on the ticket so that the tech who is assisting you can see any progress made here, though we do like to help everyone in as timely a manner as possible, it is logistically impossible to carry on multiple points of correspondence with different techs over the same issue as pieces can get lost in translation. Locking as unresolved, continuing correspondence on ticketing system.
Locked