Nagios False Alerts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Nagios False Alerts

Post by ivp2015 »

Hi Team,

We have configured Nagios monitoring of 12 Linux Server vai Nagios which is hosted on AWS. In the configuration part we had enabled Disk, load, ping and port monitoring as well.

Today at around 05:39PM (IST)we got down alerts for only 5 servers and after 2min received UP alerts 5:41PM(IST). We have checked all the server but did't find any think on all the servers side.Also we check all server uptime and last reboot.So it seems it was a false positive. Need your help to identify why these alerts were triggered.
Below are the Configuration details.

Check Interval 2 min
Retry Interval 1 min
Max check Attempts 2

Service configuration.
1- Check Disk.
2- Check Load.
3- Application port no.
4- Ping.
5-SSH

Please let us know if any other details required.
You do not have the required permissions to view the files attached to this post.
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Re: Nagios False Alerts

Post by ivp2015 »

Same configuration setting for all 12 linux Servers.

Check Interval 2 min
Retry Interval 1 min
Max check Attempts 2

Service configuration.
1- Check Disk.
2- Check Load.
3- Application port no.
4- Ping.
5-SSH
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Re: Nagios False Alerts

Post by ivp2015 »

Down Alerts Looks.
***** Nagios Hardware Alert *****

Nagios has detected a problem with this host.

Notification Type: PROBLEM
Host:
State: DOWN
Address: host IP
Info: CRITICAL - host IP: rta nan, lost 100%
Date/Time: 2019-02-16 17:40:49
You do not have the required permissions to view the files attached to this post.
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Re: Nagios False Alerts

Post by ivp2015 »

Hi Team,
Any update on this.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios False Alerts

Post by cdienger »

It lost the ability to ping these servers judging by the notification provided. These are usually due to networking issues. For example, this could occur if the route between the XI server and monitored machine goes down, a firewall drops the icmp packets used by the check to determine if the monitored server is up or down, or if the IP of the destination changed. Anything that could potentially prevent a ping between XI and the monitored machine from working essentially.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Re: Nagios False Alerts

Post by ivp2015 »

Oky. But why this was happen with only 5 servers. And other server notification setting are same.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios False Alerts

Post by cdienger »

Either because it could ping those servers during that time or because the check for those servers may not have run during that time there was an issue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios False Alerts

Post by ssax »

Given that these servers are all behind AWS, what did the output of the other services say at that point? Could it have been a firewall issue with AWS or something?

We would need to see the output of the checks during that time to see if we can glean any additional information from the services.

Did all of the AWS hosts give the same "100% lost" host results at the same time? Or did you have some that did and some that didn't?
ivp2015
Posts: 142
Joined: Fri Feb 20, 2015 12:32 am

Re: Nagios False Alerts

Post by ivp2015 »

Yes All the server hosts on AWS. If there is any network issue than all the servers notification and there services( Ping, SSH, DISK, LOAD ) notification should be down or critical. But we got only five servers was down alerts and there is no alerts for services like.( Ping, SSH, DISK, LOAD )
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios False Alerts

Post by cdienger »

Not getting notifications for services when a host is down is expected - if a host is down there's no need to spam people with emails with service notifications since it's assumed that these are down as well.

Nagios identified a few machines that it wasn't able to ping. It's clear that these checks ran and that no response was received from the remote machines. Determining why they failed isn't something that can really be investigated unless the issue is currently happening.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked