Hi Team,
We have configured Nagios monitoring of 12 Linux Server vai Nagios which is hosted on AWS. In the configuration part we had enabled Disk, load, ping and port monitoring as well.
Today at around 05:39PM (IST)we got down alerts for only 5 servers and after 2min received UP alerts 5:41PM(IST). We have checked all the server but did't find any think on all the servers side.Also we check all server uptime and last reboot.So it seems it was a false positive. Need your help to identify why these alerts were triggered.
Below are the Configuration details.
Check Interval 2 min
Retry Interval 1 min
Max check Attempts 2
Service configuration.
1- Check Disk.
2- Check Load.
3- Application port no.
4- Ping.
5-SSH
Please let us know if any other details required.
Nagios False Alerts
Nagios False Alerts
You do not have the required permissions to view the files attached to this post.
Re: Nagios False Alerts
Same configuration setting for all 12 linux Servers.
Check Interval 2 min
Retry Interval 1 min
Max check Attempts 2
Service configuration.
1- Check Disk.
2- Check Load.
3- Application port no.
4- Ping.
5-SSH
Check Interval 2 min
Retry Interval 1 min
Max check Attempts 2
Service configuration.
1- Check Disk.
2- Check Load.
3- Application port no.
4- Ping.
5-SSH
Re: Nagios False Alerts
Down Alerts Looks.
***** Nagios Hardware Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host:
State: DOWN
Address: host IP
Info: CRITICAL - host IP: rta nan, lost 100%
Date/Time: 2019-02-16 17:40:49
***** Nagios Hardware Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host:
State: DOWN
Address: host IP
Info: CRITICAL - host IP: rta nan, lost 100%
Date/Time: 2019-02-16 17:40:49
You do not have the required permissions to view the files attached to this post.
Re: Nagios False Alerts
Hi Team,
Any update on this.
Any update on this.
Re: Nagios False Alerts
It lost the ability to ping these servers judging by the notification provided. These are usually due to networking issues. For example, this could occur if the route between the XI server and monitored machine goes down, a firewall drops the icmp packets used by the check to determine if the monitored server is up or down, or if the IP of the destination changed. Anything that could potentially prevent a ping between XI and the monitored machine from working essentially.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios False Alerts
Oky. But why this was happen with only 5 servers. And other server notification setting are same.
Re: Nagios False Alerts
Either because it could ping those servers during that time or because the check for those servers may not have run during that time there was an issue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios False Alerts
Given that these servers are all behind AWS, what did the output of the other services say at that point? Could it have been a firewall issue with AWS or something?
We would need to see the output of the checks during that time to see if we can glean any additional information from the services.
Did all of the AWS hosts give the same "100% lost" host results at the same time? Or did you have some that did and some that didn't?
We would need to see the output of the checks during that time to see if we can glean any additional information from the services.
Did all of the AWS hosts give the same "100% lost" host results at the same time? Or did you have some that did and some that didn't?
Re: Nagios False Alerts
Yes All the server hosts on AWS. If there is any network issue than all the servers notification and there services( Ping, SSH, DISK, LOAD ) notification should be down or critical. But we got only five servers was down alerts and there is no alerts for services like.( Ping, SSH, DISK, LOAD )
Re: Nagios False Alerts
Not getting notifications for services when a host is down is expected - if a host is down there's no need to spam people with emails with service notifications since it's assumed that these are down as well.
Nagios identified a few machines that it wasn't able to ping. It's clear that these checks ran and that no response was received from the remote machines. Determining why they failed isn't something that can really be investigated unless the issue is currently happening.
Nagios identified a few machines that it wasn't able to ping. It's clear that these checks ran and that no response was received from the remote machines. Determining why they failed isn't something that can really be investigated unless the issue is currently happening.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.