Page 1 of 1

Notification for when entire hostgroup is down

Posted: Tue Nov 14, 2017 1:10 pm
by matt_ps
Hello, we have Nagios XI in AWS to monitor our externally facing services. Each of our hosts are added to the hostgroup for their respective data center. We had a short outage with one data center last night and I got an onslaught of emailed alerts (which were delayed delivery since our mail server resides in that datacenter). I'm looking to make the alerting a bit more intelligent and send a PagerDuty alert if an entire hostgroup (i.e. an entire data center) fails.

Is this possible? If so, how can this be configured and managed?

If it isn't natively supported, can I make a custom host check that uses a REST API call to query the hostgroup's member's statuses? Is there any script out there already created for this purpose that I can borrow from? I hate to reinvent the wheel :roll:

Re: Notification for when entire hostgroup is down

Posted: Tue Nov 14, 2017 5:31 pm
by npolovenko
Hi, @matt_ps. I'd probably go with a child-parent relationship here. So each mail server in a data center would be a parent, and all other hosts within a host group will be children. After you apply that: if mail server(parent) goes down Nagios will automatically know not to send critical notifications from other hosts(children) in the same group. Take a look here:
https://assets.nagios.com/downloads/nag ... ility.html

Re: Notification for when entire hostgroup is down

Posted: Tue Nov 14, 2017 5:42 pm
by matt_ps
Well that's fine for that one use case, but our other data centers don't have mail servers. Additionally, what if that mail server goes down for some other reason? Do we receive an alert saying the entire data center is down?

Does this mean that there's no native way to alert on hostgroups? Being able to report "40% of hosts in XYZ data center are down" and having that go through PagerDuty (our escalation platform) would be absolutely invaluable as well. Essentially we want to have PagerDuty alerts for intelligent alerts telling us that "hey, maybe something bigger is going on here than just some issues with a few various hosts".

Re: Notification for when entire hostgroup is down

Posted: Wed Nov 15, 2017 12:57 pm
by npolovenko
@matt_ps, If the mail server goes down no emails with alerts will be sent out. They will spool up and they'll be sent out all at once when the mail server goes back up.
It looks like BPI component is what you're looking for. You'll be able to receive alerts based on the percentage of how many hosts in a host group are down. https://assets.nagios.com/downloads/nag ... BPI_v2.pdf

Re: Notification for when entire hostgroup is down

Posted: Wed Nov 15, 2017 2:21 pm
by matt_ps
That's exactly what I'm after, thank you!

Re: Notification for when entire hostgroup is down

Posted: Wed Nov 15, 2017 4:02 pm
by npolovenko
@matt_ps, Not a problem! I'm going to close this thread as resolved than but if you'll have any other questions feel free to create a new one.