Host checks not changing even when system down.
Host checks not changing even when system down.
I took over this nagios instance, and I am trying to figure out why there are so many notifications when our IPSec connection to a remote site goes down randomly. I was under the assumption that when a host becomes unreachable/down, all the notifications for the service checks under it would not be sent as well. Ie. Service notifications depend on host up.
What we are experiencing is that when the IPSec connection goes down, none of the service checks are able to connect and everyone of them send an alert. When I check nagios for the host check it says that the hosts have been ok for 160 days +, while all the services under the hosts are down (or after the IPSec is back all of them are up for the same amount of minutes).
Verified by making a new host, setting it's IP to a valid IP and forcing the check so that it goes green. Changed the IP address in the config to a non-valid IP and restarted nagios, and the host stays green/OK, even though the host IP is no longer reachable.
Is there any way to determine why nagios is saying the hosts are still OK when they are unreachable, even though it still was last updated 10 seconds ago.
System is CentOS 6.4
Nagios version is 4.0.1
Any help to stop the 500 emails we are getting when the IPSec dies at 2am would be greatly appreciated.
What we are experiencing is that when the IPSec connection goes down, none of the service checks are able to connect and everyone of them send an alert. When I check nagios for the host check it says that the hosts have been ok for 160 days +, while all the services under the hosts are down (or after the IPSec is back all of them are up for the same amount of minutes).
Verified by making a new host, setting it's IP to a valid IP and forcing the check so that it goes green. Changed the IP address in the config to a non-valid IP and restarted nagios, and the host stays green/OK, even though the host IP is no longer reachable.
Is there any way to determine why nagios is saying the hosts are still OK when they are unreachable, even though it still was last updated 10 seconds ago.
System is CentOS 6.4
Nagios version is 4.0.1
Any help to stop the 500 emails we are getting when the IPSec dies at 2am would be greatly appreciated.
Re: Host checks not changing even when system down.
WIthout knowing why IPsec comes into play for your host check, it's hard to say. All other things being equal, you could always use a dependency to make sure that service X on host X requires service Y on host Y to be in a particular state before service check X will be triggered.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: Host checks not changing even when system down.
The only thing that the IPSec has to do with the nagios checks, is that it connects the network that nagios is running in to the remote network. When the ipsec tunnel goes down, nagios cannot reach the remote network.
The issue is that the host checks are never failing so we get hundreds of alerts for each of the services that are then unreachable.
Nagios is not actually checking the hosts to see if they are up. Seems like it is not doing any checks of the hosts, but is just saying that they are in the same state they were in when the check was last forced.
An example is this, I set up a host that worked, then I changed the host config file to point to an IP that does not belong to any host, so that it should state the host is down. Restarted nagios, and it still says it is fine.
The issue is that the host checks are never failing so we get hundreds of alerts for each of the services that are then unreachable.
Nagios is not actually checking the hosts to see if they are up. Seems like it is not doing any checks of the hosts, but is just saying that they are in the same state they were in when the check was last forced.
An example is this, I set up a host that worked, then I changed the host config file to point to an IP that does not belong to any host, so that it should state the host is down. Restarted nagios, and it still says it is fine.
Code: Select all
Host State Information
Host Status:
UP
(for 0d 1h 23m 35s+)
Status Information: PING OK - Packet loss = 0%, RTA = 10.11 ms
Performance Data: rta=10.111000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0
Current Attempt: 1/10 (HARD state)
Last Check Time: 05-29-2014 15:09:03
Check Type: ACTIVE
Check Latency / Duration: 0.000 / 0.029 seconds
Next Scheduled Active Check: 05-30-2014 15:19:03
Last State Change: N/A
Last Notification: N/A (notification 0)
Is This Host Flapping? N/A
In Scheduled Downtime?
NO
Last Update: 05-30-2014 15:17:54 ( 0d 0h 0m 5s ago)Re: Host checks not changing even when system down.
I would make a host which uses the ipsec check as its host check. Make it the parent of everything on the other side of the tunnel - that way, if the tunnel is down, all the children are marked as unreachable.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Host checks not changing even when system down.
Bingo. I was just trying to figure out how to write that when Andy posted this.abrist wrote:I would make a host which uses the ipsec check as its host check. Make it the parent of everything on the other side of the tunnel - that way, if the tunnel is down, all the children are marked as unreachable.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: Host checks not changing even when system down.
I will try that for dealing with the notifications, but that doesn't explain why nagios isn't changing the status of the host and not actually doing the host check.
Thanks again.
Thanks again.
Re: Host checks not changing even when system down.
What happens when you:
1.) Manually run a ping against that non-existent host from the Nagios server?
2.) Run the check_ping plugin against the non-existent host from the Nagios server?
1.) Manually run a ping against that non-existent host from the Nagios server?
2.) Run the check_ping plugin against the non-existent host from the Nagios server?
Former Nagios employee
Re: Host checks not changing even when system down.
tmcdonald wrote:What happens when you:
1.) Manually run a ping against that non-existent host from the Nagios server?
2.) Run the check_ping plugin against the non-existent host from the Nagios server?
1) --- xxx.xxx.xxx.xxx ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4224ms
2)It gives the expected error of CRITICAL - Host Unreachable
Re: Host checks not changing even when system down.
Go ahead and PM me your /usr/local/nagios/var/objects.cache file and I'll take a look.
Tech Note: PM received, stored in appropriate location on network drive
Tech Note: PM received, stored in appropriate location on network drive
Former Nagios employee