I have been testing with traps and the snmptrap command in 2 environments:
CENTOS - on the Amazon Cloud using the Nagios XI appliance
RHEL - manual install of Nagios XI
It seems that when I create traps in the CENTOS/cloud environment, somehow I get many more than expected. The file /var/log/snmpttunknown.log starts growing fast with many more traps than I issued. It is as if a loop was created somehow. I am using a virtual-private-cloud and using another server to issue the snmptrap command. The other (non-Nagios) server is running snmpd, snmptrapd and snmptt. Perhaps one or more of these services are not needed?
When I setup a similar enviornment in RHEL (also VM) using NSTI --- I get similar results. Except this time, the mySQL database starts growing like crazy. I deleted the snmptt database on Thursday --- but soon saw traps from earlier in the week reappearing! Magic?!?! They won't die or stop! It has now crashed my database.
Any ideas on what could be causing the replication of SNMP traps?
Thanks!
Zombie SNMP traps --- come back from dead and spawn!
Re: Zombie SNMP traps --- come back from dead and spawn!
Latest theory is that Nagios creates some log file of traps. When the database is cleared, does Nagios attempt to rebuild the database from the log files?
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Zombie SNMP traps --- come back from dead and spawn!
So are you solely using the trap generating script, or do you have a system or two set to send traps to the nagios server? The script itself should never send more than a single trap, when executed. There is a spool in /var/spool/snmptt. While nagios itself is not polling for traps that have not been sent, snmptt\snmpd definitely do look for these traps and will send them to nagios if they have not been cleared yet.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Zombie SNMP traps --- come back from dead and spawn!
Hi Spenser,
Can you please explain the clearing process? Or what is meant by "cleared" in "snmptt\snmpd definitely do look for these traps and will send them to nagios if they have not been cleared yet." ???
We have 3 other systems that are sending traps to Nagios. Most traps are "handled" or "noticed" in other systems. We were hoping to filter many out from NSTI by using a mySQL stored procedure. We may get thousands of traps per day. Many of them have been "handled" in other systems and we certainly cannot manually process them again in Nagios.
This may explain the "Zombie" effect.
We could probably use a "sanity check" session with you or another Nagios engineer as soon as you can arrange it.
Thanks!
GL
Can you please explain the clearing process? Or what is meant by "cleared" in "snmptt\snmpd definitely do look for these traps and will send them to nagios if they have not been cleared yet." ???
We have 3 other systems that are sending traps to Nagios. Most traps are "handled" or "noticed" in other systems. We were hoping to filter many out from NSTI by using a mySQL stored procedure. We may get thousands of traps per day. Many of them have been "handled" in other systems and we certainly cannot manually process them again in Nagios.
This may explain the "Zombie" effect.
We could probably use a "sanity check" session with you or another Nagios engineer as soon as you can arrange it.
Thanks!
GL
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Zombie SNMP traps --- come back from dead and spawn!
To be 100% honest, I just discovered this spool about a week ago after we had a different but similar issue where traps were extremely delayed coming in, but always eventually showed. I believe it works just like any other spool, traps come in and are handled in a fifo order. I can see what you are saying when it comes to not wanting to be alerted for traps that are already handled and could definitely see this causing somewhat of a zombie effect. Let me do a bit more digging on how you might clean that out for off times. Otherwise you probably could look into how the traps are handled from snmptt.conf EVENT lines and alter the script to drop traps you already know of.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Zombie SNMP traps --- come back from dead and spawn!
ok. thank you for your insights and for doing more digging.
Do you know if it is possible to tell snmptt or nagios (or some other process) that these traps do not require "clearing"?
Does it matter how the trap EVENTS are defined in snmptt.conf ??? Normal, Warning, or Critical? Do Normal events require clearing? Do Warning events require clearing?
thanks again,
GL
Do you know if it is possible to tell snmptt or nagios (or some other process) that these traps do not require "clearing"?
Does it matter how the trap EVENTS are defined in snmptt.conf ??? Normal, Warning, or Critical? Do Normal events require clearing? Do Warning events require clearing?
thanks again,
GL
Code: Select all
#
#EVENT alertTrap .1.3.6.1.4.1.1031.9.1.0 "Status Events" Normal
EVENT alertTrap .1.3.6.1.4.1.1031.9.1.0.10 "Status Events" Critical
FORMAT $*
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps" "$s" "$@" "$-*" "$*"
SDESC
Variables:
1: alertTrapUpdateType
2: alertTrapAlertId
3: alertTrapControlM
4: alertTrapMemName
5: alertTrapOrderId
6: alertTrapSeverity
7: alertTrapStatus
8: alertTrapTime
9: alertTrapUser
10: alertTrapUpdateTime
11: alertTrapMessage
12: alertTrapOwner
13: alertTrapGroup
14: alertTrapApplication
15: alertTrapJobName
Re: Zombie SNMP traps --- come back from dead and spawn!
Not sure if this helps, but the python script executed for the EVENT in snmptt.conf is quite short and writes to /usr/local/nagios/var/rw/nagios.cmd.
It seems there are 3 possible severity codes: 0 for Normal, Informational, 1 for Warning, Minor, and 2 for Critical, Severe.
Not sure what Nagios does with these severity codes. Might this be your trigger for "clearing"?
Thanks again,
GL
It seems there are 3 possible severity codes: 0 for Normal, Informational, 1 for Warning, Minor, and 2 for Critical, Severe.
Not sure what Nagios does with these severity codes. Might this be your trigger for "clearing"?
Thanks again,
GL
Code: Select all
#!/usr/bin/env python
"""
Written by Francois Meehan (Cedval Info)
First release 2004/09/15
Modified by Nagios Enterprises, LLC.
This script receives input from sec.pl concerning translated snmptraps
*** Important note: sec must send DATA within quotes
Ex: ./services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>
"""
import sys
def printusage():
print "usage: services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>"
sys.exit()
def check_arg():
try:
host = sys.argv[1]
except:
printusage()
try:
service = sys.argv[2]
except:
printusage()
try:
severity = sys.argv[3]
except:
printusage()
try:
mytime = sys.argv[4]
except:
printusage()
try:
mondata_res = sys.argv[6] + " / " + sys.argv[5]
except:
printusage()
return (host, service, severity, mytime, mondata_res)
def get_return_code(severity):
severity = severity.upper()
if severity == "INFORMATIONAL":
return_code = "0"
elif severity == "NORMAL":
return_code = "0"
elif severity == "SEVERE":
return_code = "2"
elif severity == "MAJOR":
return_code = "2"
elif severity == "CRITICAL":
return_code = "2"
elif severity == "WARNING":
return_code = "1"
elif severity == "MINOR":
return_code = "1"
else:
printusage()
return return_code
def post_results(host, service, mytime, mondata_res, return_code):
output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
+ host + ";" + service + ";" \
+ return_code + ";" + mondata_res + "\n"
output.write(results)
# Main routine...
if __name__ == '__main__':
(host, service, severity, mytime, mondata_res) = check_arg() # validating
# parameters
return_code = get_return_code(severity)
post_results(host, service, mytime, mondata_res, return_code)
-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Zombie SNMP traps --- come back from dead and spawn!
Yes that is perfectly normal and expected. The nagios.cmd is a direct command pipe into the currently running nagios process, not a spool at all. The severity codes are used to determine what state nagios should show for the trap, just like exit codes for plugins.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Zombie SNMP traps --- come back from dead and spawn!
Resolution:
Spenser found that the directory /var/spool/snmptt was owned by root, group was root. This meant that the snmptt process could not delete files after they had been added to the database. So old traps were repeatedly being added to the database and it was increasing like wild. Spenser fixed everything with:
Thank you Spenser! 
Spenser found that the directory /var/spool/snmptt was owned by root, group was root. This meant that the snmptt process could not delete files after they had been added to the database. So old traps were repeatedly being added to the database and it was increasing like wild. Spenser fixed everything with:
Code: Select all
chown -R snmptt.root /var/spool/snmptt/