Question on "how to"...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Question on "how to"...

Post by PhilG »

Hello:
We are running Nagios XI 5.2.7 (may update in the near future to the latest version).
We are monitoring 339 Hosts and 1526 Services.
Information needed for what my inquiry will be about:
1). One Apache web server (we'll call it Server "A") hosts a minimum of 80 websites.
2). Server "A" was setup using the Linux Server Wizard, and monitoring Service httpd.
2). When the 80 or so websites were setup in Nagios XI, they were configured/associated with the Host Parent (Server "A").
3). The first website was setup using the Website Wizard (the majority of the other websites then used the first website as a template). The main website service being monitored for those websites is HTTP.
4). A couple of the websites, though, were setup using the Website URL Wizard. This is just an FYI and I may need to reconfigure later.
5). Using this setup in Nagios XI, the Apache web server and all the 80 or so websites are now appearing as individual Hosts in Nagios XI.

Issue: When Apache (the HTTPD Daemon) has an issue (DDOS attack, a website starts to use a lot of resources, etc.) on Server "A", all the websites are then unreachable and Nagios XI sends out a Critical e-mail for EACH website, and sometimes a Critical e-mail stating that the Host parent maybe unreachable. HOWEVER, I can access/logon to the Host Parent (albeit slow due because of resource hog issues like memory and/or CPU processing over-utilization).

QUESTION:
Is there a way to configure within Nagios XI to NOT send an alert for EACH website on Server "A" when the Apache/HTTPD daemon is having an issue, but send only ONE e-mail identifying to check Apache on the parent server/Host Parent?
However, if there is an issue with one or a few of the websites, then still send an alert for those specific websites, because the web developer may have taken down the website(s) without my knowledge.

Thank you in advance for your assistance.
Newbie '14
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Question on "how to"...

Post by rkennedy »

You mention in #2 you're using a parent relationship - are all 80 of these websites their own hosts, or are they all services under 'Server A'?

If they're isolated to their own hosts, you could indeed use relationships - https://assets.nagios.com/downloads/nag ... ility.html

The other option is perhaps disabling the notifications on a per website basis, but leaving up a 'master' check. Another way would be to increase your max_check_attempts / notificaiton_interval to a high enough number it can ride out the DDoS. There are a few other outside-the-box ways you could do this as well.
Former Nagios Employee
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Question on "how to"...

Post by PhilG »

Hello:
The server, Server "A", is a separate Host, and its Services are HTTP, HTTPS, Apache Web Server, and Apache Httpd, based off the Linux Wizard setup.
Websites are setup as separate Hosts, based off the Website Wizard setup, and each have Services DNS Resolution, HTTP, and Ping. Each website was configured/setup with Host Parent Server "A".
Knowing that Server "A" (Parent Host) is actually not "down", and that Apache has more Daemon processes than usual running and are using up resources, we just need Nagios XI to just inform us to restart Apache.

According to the link you provided:
" By default, Nagios will notify contacts about both DOWN and UNREACHABLE host states. As an admin/tech, you might not want to get notifications about hosts that are UNREACHABLE. You know your network structure, and if Nagios notifies you that your router/firewall is down, you know that everything behind it is unreachable.
If you want to spare yourself from a flood of UNREACHABLE notifications during network outages, you can exclude the unreachable (u) option from the notification_options directive in your host definitions and/or the host_notification_options directive in your contact definitions.
"
This informs me that, with our scenario, that I just need to remove the "u" option in the Parent Host, or in the websites' host definitions, or both?

Thank you.
rkennedy wrote:You mention in #2 you're using a parent relationship - are all 80 of these websites their own hosts, or are they all services under 'Server A'?

If they're isolated to their own hosts, you could indeed use relationships - https://assets.nagios.com/downloads/nag ... ility.html

The other option is perhaps disabling the notifications on a per website basis, but leaving up a 'master' check. Another way would be to increase your max_check_attempts / notificaiton_interval to a high enough number it can ride out the DDoS. There are a few other outside-the-box ways you could do this as well.
Newbie '14
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Question on "how to"...

Post by mcapra »

If you could PM or post a system profile for us to review, that would be tremendously helpful. From the Nagios XI GUI, you can gather a profile via Admin -> System Profile -> Download Profile.
PhilG wrote:Issue: When Apache (the HTTPD Daemon) has an issue (DDOS attack, a website starts to use a lot of resources, etc.) on Server "A", all the websites are then unreachable and Nagios XI sends out a Critical e-mail for EACH website, and sometimes a Critical e-mail stating that the Host parent maybe unreachable. HOWEVER, I can access/logon to the Host Parent (albeit slow due because of resource hog issues like memory and/or CPU processing over-utilization).
So if you added the MasterHost using the Linux Server wizard, the default host check is going to be a ping. If the ping returns with minimal packet loss, the host is "UP". If there's substantial packet loss, the host is "DOWN".

I think the crux of the issue is that, if the MasterHost is undergoing DDOS and the HTTPD daemon enters a "CRITICAL" state, the state of the MasterHost is still pingable and considered "UP" which means that none of the ChildHosts (in this case, your websites) will recognize their parent as being down. It might make more sense to do one of two things:
  • A -- Change the check used on the MasterHost object from a simple ping to the status of the HTTPD daemon, add a ping in as one of the "service checks" if you like
  • B -- Change the websites you're monitoring to be services under MasterHost that are dependent on the HTTPD daemon. A "Service Dependency" would be leveraged here.
I don't think configuration wizards by themselves are going to cut it here, one way or another. There will need to be some changes made from the Core Config Manager.
Former Nagios employee
https://www.mcapra.com/
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Question on "how to"...

Post by PhilG »

Hello:
I apologize for not getting back - things have been unreasonably busy here.
I appreciate the suggestion/feedback and we'll look into your suggestion/possible solution.
Thank you!
You may close this post.

mcapra wrote:If you could PM or post a system profile for us to review, that would be tremendously helpful. From the Nagios XI GUI, you can gather a profile via Admin -> System Profile -> Download Profile.
PhilG wrote:Issue: When Apache (the HTTPD Daemon) has an issue (DDOS attack, a website starts to use a lot of resources, etc.) on Server "A", all the websites are then unreachable and Nagios XI sends out a Critical e-mail for EACH website, and sometimes a Critical e-mail stating that the Host parent maybe unreachable. HOWEVER, I can access/logon to the Host Parent (albeit slow due because of resource hog issues like memory and/or CPU processing over-utilization).
So if you added the MasterHost using the Linux Server wizard, the default host check is going to be a ping. If the ping returns with minimal packet loss, the host is "UP". If there's substantial packet loss, the host is "DOWN".

I think the crux of the issue is that, if the MasterHost is undergoing DDOS and the HTTPD daemon enters a "CRITICAL" state, the state of the MasterHost is still pingable and considered "UP" which means that none of the ChildHosts (in this case, your websites) will recognize their parent as being down. It might make more sense to do one of two things:
  • A -- Change the check used on the MasterHost object from a simple ping to the status of the HTTPD daemon, add a ping in as one of the "service checks" if you like
  • B -- Change the websites you're monitoring to be services under MasterHost that are dependent on the HTTPD daemon. A "Service Dependency" would be leveraged here.
I don't think configuration wizards by themselves are going to cut it here, one way or another. There will need to be some changes made from the Core Config Manager.
Newbie '14
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Question on "how to"...

Post by cdienger »

Thanks for the update. Closing as requested.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked