@dflick, It sounds like there are multiple issues being discussed here so it's possible this isn't addressing the core one,however your last post speaks to a specific pain point in running a wizard many times to add multiple iterations of physically identical hardware. If this is correct, I'm wondering if you have tried using the "Bulk Host Cloning and Import" Wizard? This lets you select an existing host you are monitoring and create new monitored hosts using the same services from the selected one. You can enter a bunch of hosts at once here, which would at least alleviate the repetition needed to enter a bunch of identical hardware, assuming you don't need to run the wizard on each device to pull hardware-specific data from it.
Hopefully this is helpful, but please advise if not!
Bulk changes fail silently if config files can't be verified
- jmichaelson
- Posts: 375
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Bulk changes fail silently if config files can't be verified
Is the circuit ID coming from the SNMP data that's coming from the router or are you adding it after the fact?dflick wrote: ↑Tue Nov 07, 2023 4:16 pm I was told many years ago that the way we are doing it is the only supported way to get alerts on specific interface connections when each router has a different circuit ID on each interface (except for tunnels which do have the same description as their base interface)
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Bulk changes fail silently if config files can't be verified
You raise an interesting question. I thought Nagios would only alert on the interface name or description (which are the 2 options in the router/switch wizard) so the description is used so we can get the circuit ID. I do some rewrites on traps to get better descriptions there so is it possible to pull the descriptions from snmp-get before alerting? This sounds interesting!
Today, we get an alert for an interface of a router by monitoring by interface description. That alert is sent to Pager Duty to alert. We would love to be able to de-dup in the chain as Pager Duty sees each tunnel as different because the alert includes TunnelX which is different for each tunnel that goes over the physical interface even though the description is the same. If we could have Nagios reach out and pull information when alerting, that may help because I can see how that could possibly strip off the TunnelX from the alert and de-dup would work.
Is there any design engineering available as it would be worth it to pay for some consulting hours to have someone look at our needs and suggest the best way forward. I was routed to 2 different partners but neither had any experience or suggestions how to tackle the problem.
Thanks!
Today, we get an alert for an interface of a router by monitoring by interface description. That alert is sent to Pager Duty to alert. We would love to be able to de-dup in the chain as Pager Duty sees each tunnel as different because the alert includes TunnelX which is different for each tunnel that goes over the physical interface even though the description is the same. If we could have Nagios reach out and pull information when alerting, that may help because I can see how that could possibly strip off the TunnelX from the alert and de-dup would work.
Is there any design engineering available as it would be worth it to pay for some consulting hours to have someone look at our needs and suggest the best way forward. I was routed to 2 different partners but neither had any experience or suggestions how to tackle the problem.
Thanks!
- jmichaelson
- Posts: 375
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Bulk changes fail silently if config files can't be verified
Bear with me as I'm still trying to wrap myself around all of this as we're going along. I have a search out in here to find better partners that you could each out to for consulting services, but in the meantime....
Is it correct to assume that there's a 1:1 mapping between circuit id's and the non-tunnel interfaces on the router? And as a follow-up to that what you'd ideally have if that is the case a way within Nagios XI to take, e.g., a templated configuration ignoring the description you have that contains the circuit ID and have the alerts from a given interface display the Circuit ID such that the responder doesn't have to manually look up the circuit given an interface when an alert occurs?
Is it correct to assume that there's a 1:1 mapping between circuit id's and the non-tunnel interfaces on the router? And as a follow-up to that what you'd ideally have if that is the case a way within Nagios XI to take, e.g., a templated configuration ignoring the description you have that contains the circuit ID and have the alerts from a given interface display the Circuit ID such that the responder doesn't have to manually look up the circuit given an interface when an alert occurs?
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Bulk changes fail silently if config files can't be verified
Yes, you are correct. There is a 1:1 relationship between circuit IDs and physical interfaces. Here would be one such example:
Config Name Service Description
arlington-bk-f1 Ping
arlington-bk-f1 SNMP Traps
arlington-bk-r1 BDI911 LAN gateway Bandwidth
arlington-bk-r1 BDI911 LAN gateway Status
arlington-bk-r1 GigabitEthernet0/0/0 switch1 g1/0/24 Status
arlington-bk-r1 GigabitEthernet0/0/1 switch1 g2/0/24 Status
arlington-bk-r1 GigabitEthernet0/0/2 ATT ASE AS/KQFN/001254/SW Bandwidth <-Physical Interface
arlington-bk-r1 GigabitEthernet0/0/2 ATT ASE AS/KQFN/001254/SW Status <-Physical Interface
arlington-bk-r1 Ping
arlington-bk-r1 SNMP Traps
arlington-bk-r1 Tunnel10000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel20000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel30000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel40000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-s1 GigabitEthernet1/0/24 ARLINGTON-BK-r1 0/0/0 Bandwidth
The goal is to have only 1 alert if tunnels and/or the physical interface fails. Since the logical runs over the physical, the logical interfaces will go down if they can't communicate to the far end device. The physical connection is to a local provider device which may stay active even though the connection from that provider device to the WAN fails which is why the tunnels are critical in determining a failure.
Ideally, BPI would work for us but I have had many tickets opened on BPI and we were never able to resolve an issue that if we needed to modify a BPI, we would have to rebuild ALL relationships for all devices which is not supportable. It seems like the wizard will blow away and replace the entire configuration. Also, it seemed impossible to create more than 1 BPI as I would need one for each router. Is there maybe a manual walkthrough or document on BPI that may help us manage that better?
As an alternative, we were trying to see if we could consolidate the alerts in such a way so that if the tunnels failed due to an underlaying circuit failure on the provider side, we would either get a single alert or an identical alert that Pager Duty could de-dup.
Thoughts?
Config Name Service Description
arlington-bk-f1 Ping
arlington-bk-f1 SNMP Traps
arlington-bk-r1 BDI911 LAN gateway Bandwidth
arlington-bk-r1 BDI911 LAN gateway Status
arlington-bk-r1 GigabitEthernet0/0/0 switch1 g1/0/24 Status
arlington-bk-r1 GigabitEthernet0/0/1 switch1 g2/0/24 Status
arlington-bk-r1 GigabitEthernet0/0/2 ATT ASE AS/KQFN/001254/SW Bandwidth <-Physical Interface
arlington-bk-r1 GigabitEthernet0/0/2 ATT ASE AS/KQFN/001254/SW Status <-Physical Interface
arlington-bk-r1 Ping
arlington-bk-r1 SNMP Traps
arlington-bk-r1 Tunnel10000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel20000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel30000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-r1 Tunnel40000 ATT ASE AS/KQFN/001254/SW Status <-Logical Interface
arlington-bk-s1 GigabitEthernet1/0/24 ARLINGTON-BK-r1 0/0/0 Bandwidth
The goal is to have only 1 alert if tunnels and/or the physical interface fails. Since the logical runs over the physical, the logical interfaces will go down if they can't communicate to the far end device. The physical connection is to a local provider device which may stay active even though the connection from that provider device to the WAN fails which is why the tunnels are critical in determining a failure.
Ideally, BPI would work for us but I have had many tickets opened on BPI and we were never able to resolve an issue that if we needed to modify a BPI, we would have to rebuild ALL relationships for all devices which is not supportable. It seems like the wizard will blow away and replace the entire configuration. Also, it seemed impossible to create more than 1 BPI as I would need one for each router. Is there maybe a manual walkthrough or document on BPI that may help us manage that better?
As an alternative, we were trying to see if we could consolidate the alerts in such a way so that if the tunnels failed due to an underlaying circuit failure on the provider side, we would either get a single alert or an identical alert that Pager Duty could de-dup.
Thoughts?
- jmichaelson
- Posts: 375
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Bulk changes fail silently if config files can't be verified
Yeah I'm not seeing a way to consolidate that on the Nagios side of things. We do have the concept of a host parent, where e.g., a switch or a router can be the "parent" of another host so that if the parent is down the children hosts don't generate alerts. But we don't have the same concept for a service. It seems like a useful idea however, so I'm going to open a feature request for it; no guarantees that anything will come of it of course, but I can certainly try.
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Bulk changes fail silently if config files can't be verified
Thanks! Yep, that seems useful.
For BPI, the wizard used to reset the whole BPI configuration for all devices. Is there a way to build the BPI configuration manually as that may also work as a solution.
Thanks!
For BPI, the wizard used to reset the whole BPI configuration for all devices. Is there a way to build the BPI configuration manually as that may also work as a solution.
Thanks!
Re: Bulk changes fail silently if config files can't be verified
Any ideas on the BPI setup? That would also solve our issue but I need to know if it will be manageable. It would not be supportable if we had to re-do relationships on 100 routers if we had a change. It would be awesome if that could be done via API.
- jmichaelson
- Posts: 375
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Bulk changes fail silently if config files can't be verified
While writing the ticket, I had an idea that might work. In my own experience, each interface on a router has a unique IP address. It would mean ending up with multiple hosts, but what if you set up each interface/circuit as its own host, and then had each tunnel be a service on that hose. You could then have each interface host be a child of the underlying router host being monitored. Thoughts on whether this will work for you in the interim?
Please let us know if you have any other questions or concerns.
-Jason
-Jason