Page 1 of 1

NagiosXI falsely detecting host as unreachable

Posted: Wed Aug 24, 2016 11:19 am
by canene
Hi there,

We are currently vetting NagiosXi for our network monitoring solution tool. We have NagiosXI running and currently monitoring a handful of hosts on our network.
However from time to we are running into an issues where NagiosXI will flag a host that was configured as unreachable even though the host is still accessible on the network.
The most recent host that this is affecting is one of our ESXi hosts that Nagios is monitoring. The host can successfully ping the NagiosXI server and the NagiosXI server can ping the ESXi host.
I wanted to know what can be done in these instances to get NagiosXI to detect that the host is available again. (Without having to delete the host and re-adding the host to NagiosXi).

This problem is affecting our purchase decision making because we don't want to purchasing monitoring software that is giving us false positives.

Thanks in advance!

Re: NagiosXI falsely detecting host as unreachable

Posted: Wed Aug 24, 2016 1:01 pm
by ssax
My first question is what check are you using for the host check? You can find this by going to Configure > Core Config Manager > Hosts, edit the host, and tell us what check command it's using. Also, are you using the DNS name or IP address in the Address field on the host? If you're using the DNS name, could you be having DNS issues?

When the host in question goes into the UNKNOWN state what is the exact output?

To detect if the host is available again you would adjust these three options on the host:

Code: Select all

check_interval
retry_interval
max_check_attempts
The way that it works is that the host check will occur every X minutes, the X is pulled from the check_interval definition. If a problem is detected it will re-check the host every Y minutes, the Y is pulled from the retry_interval definition. It will re-check it up to Z amount of times, Z is pulled from the max_check_attempts definition. When the host initially goes from a hard OK state to a problem state, each retry failure is considered a SOFT state until it reaches the max_check_attempts, only then is it considered a HARD state and a notification will be sent (if you have them setup).

Let us know if you have any questions.

Thank you