Page 1 of 1

Nagios 2014R1.1 is MAGICALLY removing ports from my Switches

Posted: Wed Jun 11, 2014 4:43 pm
by mberkley
Okay... Completely baffelled here. I was having issues polling devices with the Network Switch wizard using SNMP because of a issue in a older version.
Today you sent me an updated wizard, but I'm experiencing something almost mystical.


After I query the device using SNMP I receive hundreds of ports on this particular switch..
Example:
Image

It was successfully added to the monitoring engine
Image

Then when I go to list the services.... Nagios "cleans / removes / purges(?)" hundreds of those ports it discovered in the SNMP discovery, leaving only 35
Image

Then to make it more confusing, I receive this error: "/var/lib/mrtg/******.rrd does not exist" on multiple services
Image

And within moments of erroring out... the missing *.rrd" errors simply disappear
Image

Can you explain why this is happening every time we add a new switch?? I have an entire datacenter to onboard, and this never happened in version in x2012.

I am cleaning out the ports that are inactive so their status is irrelevant... I checked the CRON log to see if the deadpool process is cleaning up the missing ports but the deadpool log is empty.

I'm absolutely baffled on why this is occurring in 2014

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Thu Jun 12, 2014 10:24 am
by mberkley
Thought a restart would help... Still having the same issues.

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Thu Jun 12, 2014 11:13 am
by snapon_admin
I don't think it's removing the ports, I think it's merging them because it looks like you're using the same name for the service. It's hard to see in that picture, but I can see the following if I zoom in:

Code: Select all

Port                  Port name

GigabitEthernet2/1    vlan32-210
GigabitEthernet2/2    vlan32-210    
GigabitEthernet2/3    vlan32-210
GigabitEthernet2/4    vlan32-210
You have it set to find the name (description) of the port, and you have multiple ports with the same name. If you don't change the name Nagios will only create one check with that name. As far as the /var/lib/mrtg/******.rrd doesn't exist error, my only guess is that the rrd file for those particular ports hadn't been created yet and, since it went away moments later, it seems like it just took longer to create that rrd file than it took to run the first check on those ports.

I'm not a Nagios tech, but that's what it looks like is happening to me. What kind of switch is this? We have Nexus switches in our main data centers and those do have a ton of ports. Note that the wizard finds not only physical ports, but subinterfaces and vlans as well.

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Thu Jun 12, 2014 11:20 am
by BanditBBS
I'll chime in and say I agree 100% with what snapon said. Those ports are all being named the same, hence why they only show up once.

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Thu Jun 12, 2014 11:58 am
by snapon_admin
Had to dig up this command since I couldn't remember it off the top of my head. When the switch wizard runs it finds anything with an ifindex number which, like I said, includes all physical interfaces, subinterfaces, port channels, vlans, etc. If you want to see a list of ifindexes and what port they correspond to you can run the following command on your switch (this is assuming it's a Cisco switch with IOS):

Code: Select all

show snmp mib ifmib ifindex
If you have a Cisco Nexus, the command is:

Code: Select all

sh interface snmp-ifindex
If that list matches the list in the wizard you're good. If you don't have Cisco switches I'm not sure what the commands would be to list the index numbers, but I wouldn't be surprised if there was one.

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Thu Jun 12, 2014 5:19 pm
by mberkley
snapon_admin wrote:I don't think it's removing the ports, I think it's merging them because it looks like you're using the same name for the service. It's hard to see in that picture, but I can see the following if I zoom in:

Code: Select all

Port                  Port name

GigabitEthernet2/1    vlan32-210
GigabitEthernet2/2    vlan32-210    
GigabitEthernet2/3    vlan32-210
GigabitEthernet2/4    vlan32-210
You have it set to find the name (description) of the port, and you have multiple ports with the same name. If you don't change the name Nagios will only create one check with that name. As far as the /var/lib/mrtg/******.rrd doesn't exist error, my only guess is that the rrd file for those particular ports hadn't been created yet and, since it went away moments later, it seems like it just took longer to create that rrd file than it took to run the first check on those ports.

I'm not a Nagios tech, but that's what it looks like is happening to me. What kind of switch is this? We have Nexus switches in our main data centers and those do have a ton of ports. Note that the wizard finds not only physical ports, but subinterfaces and vlans as well.
I think that you are ABSOLUTELY right! I believe that's what must be happening. I wonder if there is a way to avoid that. This is a Nexus 7000 switch, a core switch in this particular location. Unfortunately... I can't and certainly don't want to assume the responsibility of uniquely changing the port names. So I have to explore if there are any options to import the ports as their location name. Or maybe more importantly, determine a way to go back and re-scan and amend, if the networking team can rename these ports to something more substantive. Guys I really appreciate the feedback... I am hoping Nagios could provide a suggestion for a workaround.

Re: Nagios 2014R1.1 is MAGICALLY removing ports from my Swit

Posted: Fri Jun 13, 2014 1:48 pm
by sreinhardt
Oh boy that is a very interesting case, that I am sure we have not thought of. I think we have two issues here.
1) The switch wizard should recognize that these are separate ports despite the name being the same, and figure out how to handle it from there.
2) Just adding multiple services with the same name will lead to mayhem, and thus I am sure this is why they are being removed.
I would agree that asking your networking team to rename them to more useful names would be a good start, but I will also be submitting a bug report for this. My question would be, how would you expect this to be handled, would you like it to ask what service name to use instead, ask if you would like to create duplicates, etc? I am honestly not sure what the best course of corrective action from the wizard would be.