Greetings!
I have a host that consistently alerts status as Down every 5 or 10 minutes, and then its next check succeeds. We receive flapping notifications during this activity. It will show Critical at a duration of -39s or around that mark (screenshot attached) and then it goes back to an OK status. I confirmed that the host is accessible and not having any problems when this happens. I don't suspect a networking issue, as the other hosts in this office do not have this problem. I've tried to re-add the host through the monitoring wizard, and attempted various check durations to no avail. I'm still kind of a noob with Nagios and I'm not sure what else I could do to fix this. Thanks in advance for any advice!
Host alerts down but server is up
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Host alerts down but server is up
You do not have the required permissions to view the files attached to this post.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Host alerts down but server is up
Can you show us the performance graphs for this host (last 4 hours)? It might be the round trip time which is causing issues, not packet loss.
When this problem next occurs, go to the host object and click the Advanced tab, take a screenshot.
When this problem next occurs, go to the host object and click the Advanced tab, take a screenshot.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Re: Host alerts down but server is up
Thanks for the reply, I'll record this information and post it here shortly.
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Re: Host alerts down but server is up
Hello Box293,
Does the attached help?
Does the attached help?
You do not have the required permissions to view the files attached to this post.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Host alerts down but server is up
This screenshot tells us the information we are after:

You can see in the performance data string pl=100% which is 100% packet loss.
So the ping/icmp check by default fires off 5 packets, it's appearing as though it's losing all 5 packets.
Is it just connectivity to this host that is a problem?
Is this host in the same subnet as the XI server?
From what you say, Nagios shows it down for one check interval (1/5) but recovers on the next check attempt, so it never really gets to a HARD down?
You can see in the performance data string pl=100% which is 100% packet loss.
So the ping/icmp check by default fires off 5 packets, it's appearing as though it's losing all 5 packets.
Is it just connectivity to this host that is a problem?
Is this host in the same subnet as the XI server?
From what you say, Nagios shows it down for one check interval (1/5) but recovers on the next check attempt, so it never really gets to a HARD down?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Re: Host alerts down but server is up
Correct, it is only this particular host that is exhibiting this activity.
The host is not in the same subnet as the Nagios server. However, I am monitoring many hosts on the same subnet as the affected host and they aren't showing this kind of activity.
You are correct that it shows down for one check interval and then recovers on the next and this happens every 5/10 minutes. It shows Critical red status down but then recovers to up. I've confirmed that I can ping the affected host from the subnet that Nagios is on, even when its showing down in Nagios.
The host is not in the same subnet as the Nagios server. However, I am monitoring many hosts on the same subnet as the affected host and they aren't showing this kind of activity.
You are correct that it shows down for one check interval and then recovers on the next and this happens every 5/10 minutes. It shows Critical red status down but then recovers to up. I've confirmed that I can ping the affected host from the subnet that Nagios is on, even when its showing down in Nagios.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Host alerts down but server is up
Is anything logged in these files when the problem occurs:
/var/log/messages
/usr/local/nagios/var/nagios.log
I suggest starting a constant ping in an SSH session on your XI server to the host in question. When the problem occurs do you see packets dropped?
/var/log/messages
/usr/local/nagios/var/nagios.log
I suggest starting a constant ping in an SSH session on your XI server to the host in question. When the problem occurs do you see packets dropped?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Re: Host alerts down but server is up
I've pasted some of the errors in the attached log from the log files your requested. I tried a continuous ping to the affected host from the XI cli over a 2 minute period and it reports the following:
ping statistics ---
156 packets transmitted, 111 received, 28% packet loss, time 156333ms
rtt min/avg/max/mdev = 26.792/32.432/266.600/24.965 ms
You have mail in /var/spool/mail/root
This is the activity of the continuous ping when the host is showing both up and down in Nagios.
When I do the same for any of the other hosts on that subnet, they report 0% packet loss.
The host in question is a VMware ESXI server. The other ESX hosts in that farm have no issues with packet loss. Strange that it is just this one.
ping statistics ---
156 packets transmitted, 111 received, 28% packet loss, time 156333ms
rtt min/avg/max/mdev = 26.792/32.432/266.600/24.965 ms
You have mail in /var/spool/mail/root
This is the activity of the continuous ping when the host is showing both up and down in Nagios.
When I do the same for any of the other hosts on that subnet, they report 0% packet loss.
The host in question is a VMware ESXI server. The other ESX hosts in that farm have no issues with packet loss. Strange that it is just this one.
You do not have the required permissions to view the files attached to this post.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Host alerts down but server is up
The packet loss is most certainly your issue. I suspect some sort of arp poisoning or something. I would inspect the arp debug of the switch that your vmkernel port is connected to and also take a look at that VMware host's events. It's fairly safe to eliminate the Nagios server from being the problem since other hosts are fine. To be certain you could try that same persistent ping from a nearby server (when I say nearby, I mean near the Nagios server).future ruins wrote:ping statistics ---
156 packets transmitted, 111 received, 28% packet loss, time 156333ms
rtt min/avg/max/mdev = 26.792/32.432/266.600/24.965 ms
You have mail in /var/spool/mail/root
-
future ruins
- Posts: 9
- Joined: Mon Apr 14, 2014 5:25 pm
Re: Host alerts down but server is up
Yep, a continuous ping from another host on the Nagios server subnet is producing the same packet loss. I guess I should have tried that first when I did the regular ping from another host. Very strange but I agree, that at this point, we can rule out Nagios being the problem. I will investigate further with my network team. Thanks for your help regardless.