Notification for when entire hostgroup is down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
matt_ps
Posts: 11
Joined: Mon Oct 09, 2017 11:50 am

Notification for when entire hostgroup is down

Post by matt_ps »

Hello, we have Nagios XI in AWS to monitor our externally facing services. Each of our hosts are added to the hostgroup for their respective data center. We had a short outage with one data center last night and I got an onslaught of emailed alerts (which were delayed delivery since our mail server resides in that datacenter). I'm looking to make the alerting a bit more intelligent and send a PagerDuty alert if an entire hostgroup (i.e. an entire data center) fails.

Is this possible? If so, how can this be configured and managed?

If it isn't natively supported, can I make a custom host check that uses a REST API call to query the hostgroup's member's statuses? Is there any script out there already created for this purpose that I can borrow from? I hate to reinvent the wheel :roll:
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Notification for when entire hostgroup is down

Post by npolovenko »

Hi, @matt_ps. I'd probably go with a child-parent relationship here. So each mail server in a data center would be a parent, and all other hosts within a host group will be children. After you apply that: if mail server(parent) goes down Nagios will automatically know not to send critical notifications from other hosts(children) in the same group. Take a look here:
https://assets.nagios.com/downloads/nag ... ility.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
matt_ps
Posts: 11
Joined: Mon Oct 09, 2017 11:50 am

Re: Notification for when entire hostgroup is down

Post by matt_ps »

Well that's fine for that one use case, but our other data centers don't have mail servers. Additionally, what if that mail server goes down for some other reason? Do we receive an alert saying the entire data center is down?

Does this mean that there's no native way to alert on hostgroups? Being able to report "40% of hosts in XYZ data center are down" and having that go through PagerDuty (our escalation platform) would be absolutely invaluable as well. Essentially we want to have PagerDuty alerts for intelligent alerts telling us that "hey, maybe something bigger is going on here than just some issues with a few various hosts".
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Notification for when entire hostgroup is down

Post by npolovenko »

@matt_ps, If the mail server goes down no emails with alerts will be sent out. They will spool up and they'll be sent out all at once when the mail server goes back up.
It looks like BPI component is what you're looking for. You'll be able to receive alerts based on the percentage of how many hosts in a host group are down. https://assets.nagios.com/downloads/nag ... BPI_v2.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
matt_ps
Posts: 11
Joined: Mon Oct 09, 2017 11:50 am

Re: Notification for when entire hostgroup is down

Post by matt_ps »

That's exactly what I'm after, thank you!
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Notification for when entire hostgroup is down

Post by npolovenko »

@matt_ps, Not a problem! I'm going to close this thread as resolved than but if you'll have any other questions feel free to create a new one.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked