Create alert when two VM hosts are online simultaneously
Posted: Mon Jun 15, 2020 10:39 am
TLDR: Trying to create some service or command that would alert if check_ping simultaneously returned a critical or OK on two different VMs running from two different hypervisors. Is this doable in Nagios? If so where should I start?
I admin over a testbed environment where a single controller VM handles much of the hardware components inside. We're currently moving into a position where we want to test a new controller VM but we never want both controllers active simultaneously. I was asked if it was possible to get my current Nagios setup to send out an alert if we ever have both controller VMs online at the same time. Each of these controllers are Linux KVM VMs (Debian 8 and Debian 10) running in two separate Linux hypervisors. Both VMs are on the same separate LAN (192.168.60) from our Nagios host (192.168.4). We generally SSH in by proxying through a bastion host and IPTables on the controllers are locked down so SSH is only allowed from the hypervisor (This IPtables lockdown is not an especially desirable feature in this scenario but it's done to emulate our real world environment). I bring this up because I think installing NRPE on the controller VMs is not viable, both for IPTables and policy reasons. However ping (by IP) works for both hypervisors and both controllers from the bastion host.
My main idea that I think would be the simplest assumes that I can create some kind of logic in an NRPE service definition that would execute an NRPE command on our bastion host to check_ping on both controller VMs and setup some kind of exclusive or logic so an alert is generated if both VMs return check_ping as OK or both VMs return check_ping CRITICAL. Is this doable in Nagios? I've been googling around but I haven't found too many hits on creating a critical (or more specifically an alert message) only when two checks return critical (or, ideally, both critical OR both OK)
Another possible route is using some kind of virsh nrpe plugin that I imagine I might have to write myself. The major issue with this is our hypervisors are in a specific state that matches our real-world environment and I would absolutely not want to install anything on them unless there was no alternative (and then this whole idea might just get scrapped). And, of course, NRPE 5666 is not opened on these VMs and it would be a significant feat convincing management to alter configuration away from our real-world setup for any reason.
Any advice, recommendations, or further reading would be greatly appreciated. Thanks so much!
I admin over a testbed environment where a single controller VM handles much of the hardware components inside. We're currently moving into a position where we want to test a new controller VM but we never want both controllers active simultaneously. I was asked if it was possible to get my current Nagios setup to send out an alert if we ever have both controller VMs online at the same time. Each of these controllers are Linux KVM VMs (Debian 8 and Debian 10) running in two separate Linux hypervisors. Both VMs are on the same separate LAN (192.168.60) from our Nagios host (192.168.4). We generally SSH in by proxying through a bastion host and IPTables on the controllers are locked down so SSH is only allowed from the hypervisor (This IPtables lockdown is not an especially desirable feature in this scenario but it's done to emulate our real world environment). I bring this up because I think installing NRPE on the controller VMs is not viable, both for IPTables and policy reasons. However ping (by IP) works for both hypervisors and both controllers from the bastion host.
My main idea that I think would be the simplest assumes that I can create some kind of logic in an NRPE service definition that would execute an NRPE command on our bastion host to check_ping on both controller VMs and setup some kind of exclusive or logic so an alert is generated if both VMs return check_ping as OK or both VMs return check_ping CRITICAL. Is this doable in Nagios? I've been googling around but I haven't found too many hits on creating a critical (or more specifically an alert message) only when two checks return critical (or, ideally, both critical OR both OK)
Another possible route is using some kind of virsh nrpe plugin that I imagine I might have to write myself. The major issue with this is our hypervisors are in a specific state that matches our real-world environment and I would absolutely not want to install anything on them unless there was no alternative (and then this whole idea might just get scrapped). And, of course, NRPE 5666 is not opened on these VMs and it would be a significant feat convincing management to alter configuration away from our real-world setup for any reason.
Any advice, recommendations, or further reading would be greatly appreciated. Thanks so much!