Zombie SNMP traps --- come back from dead and spawn!

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

I have been testing with traps and the snmptrap command in 2 environments:

CENTOS - on the Amazon Cloud using the Nagios XI appliance
RHEL - manual install of Nagios XI

It seems that when I create traps in the CENTOS/cloud environment, somehow I get many more than expected. The file /var/log/snmpttunknown.log starts growing fast with many more traps than I issued. It is as if a loop was created somehow. I am using a virtual-private-cloud and using another server to issue the snmptrap command. The other (non-Nagios) server is running snmpd, snmptrapd and snmptt. Perhaps one or more of these services are not needed?

When I setup a similar enviornment in RHEL (also VM) using NSTI --- I get similar results. Except this time, the mySQL database starts growing like crazy. I deleted the snmptt database on Thursday --- but soon saw traps from earlier in the week reappearing! Magic?!?! They won't die or stop! It has now crashed my database.

Any ideas on what could be causing the replication of SNMP traps?

Thanks!
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

Latest theory is that Nagios creates some log file of traps. When the database is cleared, does Nagios attempt to rebuild the database from the log files?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by sreinhardt »

So are you solely using the trap generating script, or do you have a system or two set to send traps to the nagios server? The script itself should never send more than a single trap, when executed. There is a spool in /var/spool/snmptt. While nagios itself is not polling for traps that have not been sent, snmptt\snmpd definitely do look for these traps and will send them to nagios if they have not been cleared yet.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

Hi Spenser,

Can you please explain the clearing process? Or what is meant by "cleared" in "snmptt\snmpd definitely do look for these traps and will send them to nagios if they have not been cleared yet." ???

We have 3 other systems that are sending traps to Nagios. Most traps are "handled" or "noticed" in other systems. We were hoping to filter many out from NSTI by using a mySQL stored procedure. We may get thousands of traps per day. Many of them have been "handled" in other systems and we certainly cannot manually process them again in Nagios.

This may explain the "Zombie" effect.

We could probably use a "sanity check" session with you or another Nagios engineer as soon as you can arrange it.

Thanks!

GL
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by sreinhardt »

To be 100% honest, I just discovered this spool about a week ago after we had a different but similar issue where traps were extremely delayed coming in, but always eventually showed. I believe it works just like any other spool, traps come in and are handled in a fifo order. I can see what you are saying when it comes to not wanting to be alerted for traps that are already handled and could definitely see this causing somewhat of a zombie effect. Let me do a bit more digging on how you might clean that out for off times. Otherwise you probably could look into how the traps are handled from snmptt.conf EVENT lines and alter the script to drop traps you already know of.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

ok. thank you for your insights and for doing more digging.

Do you know if it is possible to tell snmptt or nagios (or some other process) that these traps do not require "clearing"?

Does it matter how the trap EVENTS are defined in snmptt.conf ??? Normal, Warning, or Critical? Do Normal events require clearing? Do Warning events require clearing?

thanks again,

GL

Code: Select all

#
#EVENT alertTrap .1.3.6.1.4.1.1031.9.1.0 "Status Events" Normal
EVENT alertTrap .1.3.6.1.4.1.1031.9.1.0.10 "Status Events" Critical
FORMAT $*
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps" "$s" "$@" "$-*" "$*"
SDESC

Variables:
  1: alertTrapUpdateType
  2: alertTrapAlertId
  3: alertTrapControlM
  4: alertTrapMemName
  5: alertTrapOrderId
  6: alertTrapSeverity
  7: alertTrapStatus
  8: alertTrapTime
  9: alertTrapUser
  10: alertTrapUpdateTime
  11: alertTrapMessage
  12: alertTrapOwner
  13: alertTrapGroup
  14: alertTrapApplication
  15: alertTrapJobName
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

Not sure if this helps, but the python script executed for the EVENT in snmptt.conf is quite short and writes to /usr/local/nagios/var/rw/nagios.cmd.
It seems there are 3 possible severity codes: 0 for Normal, Informational, 1 for Warning, Minor, and 2 for Critical, Severe.

Not sure what Nagios does with these severity codes. Might this be your trigger for "clearing"?

Thanks again,

GL


Code: Select all

#!/usr/bin/env python

"""
Written by Francois Meehan (Cedval Info)
First release 2004/09/15
Modified by Nagios Enterprises, LLC.

This script receives input from sec.pl concerning translated snmptraps

*** Important note: sec must send DATA within quotes


Ex: ./services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>
"""

import sys


def printusage():
    print "usage: services.py <HOST> <SERVICE> <SEVERITY> <TIME> <PERFDATA> <DATA>"
    sys.exit()


def check_arg():
    try:
        host = sys.argv[1]
    except:
        printusage()
    try:
        service = sys.argv[2]
    except:
        printusage()
    try:
        severity = sys.argv[3]
    except:
        printusage()
    try:
        mytime = sys.argv[4]
    except:
        printusage()
    try:
        mondata_res = sys.argv[6] + " / " + sys.argv[5]
    except:
        printusage()
    return (host, service, severity, mytime, mondata_res)

def get_return_code(severity):
    severity = severity.upper()
    if severity == "INFORMATIONAL":
        return_code = "0"
    elif severity == "NORMAL":
        return_code = "0"
    elif severity == "SEVERE":
        return_code = "2"
    elif severity == "MAJOR":
        return_code = "2"
    elif severity == "CRITICAL":
        return_code = "2"
    elif severity == "WARNING":
        return_code = "1"
    elif severity == "MINOR":
        return_code = "1"
    else:
        printusage()
    return return_code


def post_results(host, service, mytime, mondata_res, return_code):
    output = open('/usr/local/nagios/var/rw/nagios.cmd', 'w')
    results = "[" + mytime + "] " + "PROCESS_SERVICE_CHECK_RESULT;" \
        + host + ";" + service + ";" \
        + return_code + ";" + mondata_res + "\n"
    output.write(results)


# Main routine...
if __name__ == '__main__':
    (host, service, severity, mytime, mondata_res) = check_arg()  # validating
                                                                  # parameters
    return_code = get_return_code(severity)
    post_results(host, service, mytime, mondata_res, return_code)
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by sreinhardt »

Yes that is perfectly normal and expected. The nagios.cmd is a direct command pipe into the currently running nagios process, not a spool at all. The severity codes are used to determine what state nagios should show for the trap, just like exit codes for plugins.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
c6391925
Posts: 28
Joined: Thu May 23, 2013 4:54 pm

Re: Zombie SNMP traps --- come back from dead and spawn!

Post by c6391925 »

Resolution:

Spenser found that the directory /var/spool/snmptt was owned by root, group was root. This meant that the snmptt process could not delete files after they had been added to the database. So old traps were repeatedly being added to the database and it was increasing like wild. Spenser fixed everything with:

Code: Select all

chown -R snmptt.root /var/spool/snmptt/
Thank you Spenser! :D
Locked