Has anyone successfully built instances of Nagios out in AWS that is effectively load balanced in a way that alerting is not duplicated?
What I would like to achieve is to have Nagios servers running in multiple availability zones within AWS to ensure monitoring functions as close to 100% of the time as possible. However, this presents challenges in some aspects of monitoring. For instance, if you have multiple instances of Nagios all polling for a specific device and that device goes down, you are going to get multiple alerts for that one incident. I think the inbound traps are easily solved with a load balancer but I dont know what it means for one of three Nagios servers to get a trap.
What are peoples thoughts on this design aspect?
Nagios redundancy in AWS
Re: Nagios redundancy in AWS
That sort of setup is likely to cause some fragmented reporting because only one XI instance is receiving the actual check information. This of course doesn't matter if you don't care about reporting or storing time-series data regarding your services; The setup should work in this case assuming all contacts and notification settings are correctly configured.
Personally, If the primary concern is having multiple XI instances running but avoiding duplicate alerts, I'd ship alerts to a message queue (I made a dirt simple RabbitMQ component) and let your queue's consumers deal with removing duplicate messages. It's a bit lazy, but it'll ensure each XI instance has a fully copy of everything happening in your infrastructure. If you lose half of your infra, and 3 of 4 XI instances happens to be in that half, you can still have some pretty comprehensive intel to work with from the remaining XI instance.
As a note, my RabbitMQ sender I linked above would definitely have to be modified to include destination emails in this setup. Otherwise you lose out on all of the rich Nagios alerting logic and wind up doubling work.
A nice thing about RabbitMQ is it has native high-availability and fail-over options. Plus you'd have the added resilience of not black-holing your alerts when email goes down.
Personally, If the primary concern is having multiple XI instances running but avoiding duplicate alerts, I'd ship alerts to a message queue (I made a dirt simple RabbitMQ component) and let your queue's consumers deal with removing duplicate messages. It's a bit lazy, but it'll ensure each XI instance has a fully copy of everything happening in your infrastructure. If you lose half of your infra, and 3 of 4 XI instances happens to be in that half, you can still have some pretty comprehensive intel to work with from the remaining XI instance.
As a note, my RabbitMQ sender I linked above would definitely have to be modified to include destination emails in this setup. Otherwise you lose out on all of the rich Nagios alerting logic and wind up doubling work.
A nice thing about RabbitMQ is it has native high-availability and fail-over options. Plus you'd have the added resilience of not black-holing your alerts when email goes down.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Nagios redundancy in AWS
Thanks as always for the input mcapra. Did this help, Dan?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.