Monitoring a cluster with SNMP Traps

Matthew.Cary · Post by **Matthew.Cary** » Thu Feb 20, 2020 5:53 pm

We have a Nutanix Cluster that we are trying to setup to alert on SNMP Trap alerts.

Our difficulty is that the management node of the cluster has a name and an IP, but the SNMP traps can come from any node in the cluster. These are not node specific traps, but they can come from any node because of how Nutanix is designed. A "cluster running out of disk space" might come from any of the component nodes, not the management node.

Right now the only way we can figure out to catch these alerts is to setup each and every node as a host, but we are hoping to avoid that.

How can we setup Nagios to catch SNMP Traps from any of 10 or 20 nodes and interpret them as being from one Host?

Or am I thinking of this incorrectly?

Post by **mbellerue** » Fri Feb 21, 2020 1:48 pm

A quick way to do this would be to go to Admin -> SNMP Trap Interface and then add a trap definition for your OIDs. In the Add A Trap Definition tab, you can specify the event, the OID, severity, etc, as well as which Host the trap will be applied to. So any device that sends in that specific OID, it will only go to the specified host.

There are some example traps already defined in the Defined Traps tab. It might be worth looking at those, and then heading over to the Advanced tab aaaaaalllllll the way to the right, and sending in a custom trap just to make sure everything is working the way you would expect.

Matthew.Cary · Post by **Matthew.Cary** » Fri Feb 21, 2020 2:56 pm

I think we might be talking past each other. Let me ask another way.

I have 6 nutanix clusters. Each has multiple nodes. All are running the Nutanix os & will send similar traps.

Cluster 1 has 8 nodes
Cluster 2 has 3 nodes
Cluster 3 has 10 nodes
etc

I don't want to setup each node as a host, I would prefer to setup each cluster as a host in Nagios. When a trap is sent from Cluster1 it can originate from any of the 8 nodes, each with it's own IP. That kind of distributed origin is just how Nutanix rolls, they don't funnel everything through their management IP. Nagios knows about the Cluster, but doesn't know about each node.

My issue is, I may get a Trap that says "Nutanix Error #3" from any node from cluster 1, 2, or 3 (or 4-6). I need that trap to be associated with the correct cluster, so I can't just set a definition that says "Nutanix Error #3 is always from cluster2, because that would funnel *all* the traps of that type from all the clusters to just cluster2 (would it not?).
I feel like I should be able to setup a range of Trap Origin IPs to associate with each cluster, but I'm not sure if thats possible? Or do I have to setup each node as a separate host, which is what I'm trying to avoid.

Post by **mbellerue** » Fri Feb 21, 2020 5:03 pm

Oh yeah, we're on the same page. The part that I missed was that there are multiple clusters. That changes things. Now we need to ssh into the Nagios system and start playing around with files.

But first, you still need to create a Host object, just like in the explanation above. Except this time you're going to create 3 Host objects, one to represent each cluster. Each Host object also needs a Service object for SNMP traps to be sent to.

Hosts:
Cluster 1
Cluster 2
Cluster 3

Services, all passive:
Cluster 1 SNMP Trap
Cluster 2 SNMP Trap
Cluster 3 SNMP Trap

Next, I am going to presume that each cluster has its own subnet, just to help keep things nice and clean. If this isn't the case, you are going to have to find some piece of commonality so that you can match up which cluster an incoming OID came from.

Cluster 1 - 8 nodes = 192.168.1.0/24
Cluster 2 - 3 nodes = 192.168.2.0/24
Cluster 3 - 11 nodes = 192.168.3.0/24

There are two important files in question here.

/etc/snmp/snmptt.conf
/etc/snmp/snmptt.conf.nxti

What's happening here is that the Nagios web interface configured SNMP trap definitions, writes the definitions to the database, and when the configuration is applied, it writes the information in the database out to /etc/snmp/snmptt.conf.nxti. Do not edit this file. Doing so would result in the same problems as manually editing a host or service configuration file.

But, you can use it to get a sort of template definition, if you already have an SNMP trap defined in the GUI. Grab your trap definition out of snmptt.conf.nxti (if you have one, if not, grab one of the example trap definitions), and we will put it in /etc/snmp/snmptt.conf. But do remember, if you are pulling one of your Nutanix traps out of snmptt.conf.nxti, you will want to remove it from the web interface before you add it to snmptt.conf.

Here is a generic example,

Code: Select all

EVENT .1.3.6.1.4.1.12356.100.1.3.0.999 "Status Events" Normal
FORMAT Received trap "$N" with variables "$+*"
EXEC php /usr/local/nagiosxi/scripts/nxti.php --event_name="$N"  --event_oid="$i" --numeric_oid="$o" --symbolic_oid="$O" --community="$C" --trap_hostname="$R" --trap_ip="$aR" --agent_hostname="$A" --agent_ip="$aA" --category="$c" --severity="$s" --uptime="$T" --datetime="$x $X" --unixtime="$@" --bindings="$+*"
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps" "$s" "$@" "$-*" "$*"
SDESC
Trap sent for diagnostic purposes by an administrator.
Variables:
  1: fnSysSerial
  2: sysName
EDESC

This is roughly what one of the example trap definitions look like. You're going to modify it, like so.

Code: Select all

EVENT .1.3.6.1.4.1.12356.100.1.3.0.999 "Status Events" Normal
FORMAT Received trap "$N" with variables "$+*"
EXEC php /usr/local/nagiosxi/scripts/nxti.php --event_name="$N"  --event_oid="$i" --numeric_oid="$o" --symbolic_oid="$O" --community="$C" --trap_hostname="$R" --trap_ip="$aR" --agent_hostname="$A" --agent_ip="$aA" --category="$c" --severity="$s" --uptime="$T" --datetime="$x $X" --unixtime="$@" --bindings="$+*"
EXEC /usr/local/bin/snmptraphandling.py "Cluster 1" "Cluster 1 SNMP Trap" "$s" "$@" "$-*" "$*"
MATCH $aA: 192.168.1.0/24
SDESC
Trap sent for diagnostic purposes by an administrator.
Variables:
  1: fnSysSerial
  2: sysName
EDESC

And because I can't use formatting in a code block, here are the key differences.

Original:
EXEC /usr/local/bin/snmptraphandling.py "$r" "SNMP Traps" "$s" "$@" "$-*" "$*"

Updated:
EXEC /usr/local/bin/snmptraphandling.py "Cluster 1" "Cluster 1 SNMP Trap" "$s" "$@" "$-*" "$*"
MATCH $aA: 192.168.1.0/24

Now, what the heck is happening here? What we've done is we're configuring snmptt and telling it what to do when it receives OID .1.3.6.1.4.1.12356.100.1.3.0.999. We are going to define this OID 3 times, one for each cluster. Change the above EXEC and MATCH lines accordingly.

Once you have defined the OID 3 times, one for each cluster, save the file, restart snmptt, and send a trap from one of your clusters. When this is done, any one of the nodes in cluster 1 (for example) could send out a trap using the specified OID, and based on the IP address that sent the trap (the node's IP address), snmptt will see which IP range it belongs to, and assign the trap to the correct Host object in Nagios.

This is the deep dark side of SNMP, so if this isn't terribly clear, let me know, and I will try to explain it better.

Reference, keep this handy:
http://snmptt.sourceforge.net/docs/snmp ... ile-format

Matthew.Cary · Post by **Matthew.Cary** » Fri Mar 13, 2020 10:50 am

I got this working a couple weeks ago & all seems well from a functional point of view, so a *BIG* thank you there. I would not have figured this out on my own.

But I have another question

As I had over 750 traps defined after letting the MIB import wizard digest the Nutanix MIB, I ended up copy/pasting the values from /etc/snmp/snmptt.conf.nxti into a new config file for each cluster:
/etc/snmp/Cluster01_snmptt.conf
/etc/snmp/Cluster02_snmptt.conf
/etc/snmp/Cluster03_snmptt.conf
and so on
Each one was edited via some find/replace work to add the cluster name and an IP range per your instructions above.
I Then modified the TrapFiles section of /etc/snmp/snmptt.ini to add each of the above files in adition to /etc/snmp/snmptt.conf and /etc/snmp/snmptt.conf.nxti

I then disabled all the auto-created trap definitions in NagiosXI

At that point, things were working!

Thing of it is, none of this configuration is visible in NagiosXI. All my configs were done in SNMPTT and I ended up disabling the traps NagiosXI knows about, which as I understand live in /etc/snmp/snmptt.conf.nxti. So if I want traps to show up in the GUI they need to be in snmptt.conf.nxti, but every time XI restarts itself it overwrites this file so I can't just edit it.

Is there any way to pull this config info into Nagios without hand creating each of my 3000 traps in the GUI? Or do some kind of bulk modify that touches traps?

Post by **mbellerue** » Fri Mar 13, 2020 5:45 pm

Okay, I think you went a slightly different route than I was going for. But upon reading it, yours is probably the better route. In my explanation, I was thinking that all SNMP traps from a given cluster would report to a single SNMP Trap service. Hence, this bit.

EXEC /usr/local/bin/snmptraphandling.py "Cluster 1" "Cluster 1 SNMP Trap" "$s" "$@" "$-*" "$*"

The idea is that you would set stalking and volatile on that service, and then you would be alerted/notified every time a trap came in to that service. E.g. if you had 2 (or more) SNMP trap worthy events from a cluster, one after the other.

However, what you are driving for is an SNMP Trap service per OID. Which, you are absolutely correct, would take a good long while to manually create 3000 services...Unless you used the API to create those passive SNMP Trap services. Then this bit here is defined per OID, based on however you want to name those services.

EXEC /usr/local/bin/snmptraphandling.py "Cluster 1" "Cluster 1 SNMP Trap OID.1" "$s" "$@" "$-*" "$*"
EXEC /usr/local/bin/snmptraphandling.py "Cluster 1" "Cluster 1 SNMP Trap OID.2" "$s" "$@" "$-*" "$*"

So on and so forth. Then, there's the tricky bit about the snmptt.conf.nxti and your Cluster1.conf, Cluster2.conf, Cluster3.conf. You should be able to just add these to the /etc/snmp/snmptt.ini. At the bottom should be an area labeled [TrapFiles]. Add your configuration files there, and now snmptt has your Cluster configs.

Nagios Support Forum

Monitoring a cluster with SNMP Traps

Monitoring a cluster with SNMP Traps

Re: Monitoring a cluster with SNMP Traps

Re: Monitoring a cluster with SNMP Traps

Re: Monitoring a cluster with SNMP Traps

Re: Monitoring a cluster with SNMP Traps

Re: Monitoring a cluster with SNMP Traps