Notification for when entire hostgroup is down

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

Notification for when entire hostgroup is down

Postby matt_ps » Tue Nov 14, 2017 1:10 pm

Hello, we have Nagios XI in AWS to monitor our externally facing services. Each of our hosts are added to the hostgroup for their respective data center. We had a short outage with one data center last night and I got an onslaught of emailed alerts (which were delayed delivery since our mail server resides in that datacenter). I'm looking to make the alerting a bit more intelligent and send a PagerDuty alert if an entire hostgroup (i.e. an entire data center) fails.

Is this possible? If so, how can this be configured and managed?

If it isn't natively supported, can I make a custom host check that uses a REST API call to query the hostgroup's member's statuses? Is there any script out there already created for this purpose that I can borrow from? I hate to reinvent the wheel :roll:
matt_ps
 
Posts: 9
Joined: Mon Oct 09, 2017 11:50 am

Re: Notification for when entire hostgroup is down

Postby npolovenko » Tue Nov 14, 2017 5:31 pm

Hi, @matt_ps. I'd probably go with a child-parent relationship here. So each mail server in a data center would be a parent, and all other hosts within a host group will be children. After you apply that: if mail server(parent) goes down Nagios will automatically know not to send critical notifications from other hosts(children) in the same group. Take a look here:
https://assets.nagios.com/downloads/nag ... ility.html
User avatar
npolovenko
 
Posts: 367
Joined: Mon May 15, 2017 5:00 pm

Re: Notification for when entire hostgroup is down

Postby matt_ps » Tue Nov 14, 2017 5:42 pm

Well that's fine for that one use case, but our other data centers don't have mail servers. Additionally, what if that mail server goes down for some other reason? Do we receive an alert saying the entire data center is down?

Does this mean that there's no native way to alert on hostgroups? Being able to report "40% of hosts in XYZ data center are down" and having that go through PagerDuty (our escalation platform) would be absolutely invaluable as well. Essentially we want to have PagerDuty alerts for intelligent alerts telling us that "hey, maybe something bigger is going on here than just some issues with a few various hosts".
matt_ps
 
Posts: 9
Joined: Mon Oct 09, 2017 11:50 am

Re: Notification for when entire hostgroup is down

Postby npolovenko » Wed Nov 15, 2017 12:57 pm

@matt_ps, If the mail server goes down no emails with alerts will be sent out. They will spool up and they'll be sent out all at once when the mail server goes back up.
It looks like BPI component is what you're looking for. You'll be able to receive alerts based on the percentage of how many hosts in a host group are down. https://assets.nagios.com/downloads/nag ... BPI_v2.pdf
User avatar
npolovenko
 
Posts: 367
Joined: Mon May 15, 2017 5:00 pm

Re: Notification for when entire hostgroup is down

Postby matt_ps » Wed Nov 15, 2017 2:21 pm

That's exactly what I'm after, thank you!
matt_ps
 
Posts: 9
Joined: Mon Oct 09, 2017 11:50 am

Re: Notification for when entire hostgroup is down

Postby npolovenko » Wed Nov 15, 2017 4:02 pm

@matt_ps, Not a problem! I'm going to close this thread as resolved than but if you'll have any other questions feel free to create a new one.
User avatar
npolovenko
 
Posts: 367
Joined: Mon May 15, 2017 5:00 pm


Return to Nagios XI

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 9 guests