False warning NagiosXI

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
RockerMan
Posts: 70
Joined: Fri Nov 01, 2013 12:16 am

False warning NagiosXI

Post by RockerMan »

Hi

NagiosXI
Installed Version: 5.4.5

Sometimes there is a false warning about the unavailability of the host, which is disabled, and the host itself is in the Acknowledged, so as not to spam the e-mail.

Code: Select all

2017-09-04 20:58:21	HOST ALERT: spb-wan-r2;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%
2017-09-04 20:56:43	HOST ALERT: spb-wan-r2;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
2017-09-04 20:55:05	HOST ALERT: spb-wan-r2;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
2017-09-04 20:51:26	HOST ALERT: spb-wan-r2;UP;HARD;3;PING WARNING - Packet loss = 87%, RTA = 0.90 ms
Although the host is disconnected and is located in the Acknowledged, Nagios writes that he is an UP and the percentage of packet loss. After that, the Acknowledge is of course removed and spam begins to e-mail about the inaccessibility of the host.

I can not understand the reason for this behavior of Nagios in relation to the switched off host. Help please understand.

Thank you.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False warning NagiosXI

Post by scottwilkerson »

Acknowledged is not the same an disabled.

When you Acknowledge a problem, it is only Acknowledged until it gets an OK status.

To get what you describe, you want to go the the host detail page, click Advanced Tab, and under "Host Attributes" click the X next to "Active Checks"

This will stop the host from being checked at all until you click it again.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
RockerMan
Posts: 70
Joined: Fri Nov 01, 2013 12:16 am

Re: False warning NagiosXI

Post by RockerMan »

scottwilkerson wrote:Acknowledged is not the same an disabled.
When you Acknowledge a problem, it is only Acknowledged until it gets an OK status.
To get what you describe, you want to go the the host detail page, click Advanced Tab, and under "Host Attributes" click the X next to "Active Checks"
This will stop the host from being checked at all until you click it again.
It is definitely that "Acknowledged is not the same an disabled", sorry, this is my bad english.
It was meant that the host disconnected from power supply.

The situation is such that on a host that is disconnected from power supply, sometimes its state changes to "UP", and acknowledge is removed, then spam goes to the e-mail and sms messages to the support service, and they begin to get nervous and call everyone ...

Disabling the "Active Checks" is not exactly what you need, we set the acknowledge on the temporarily disabled host so that when it is turned on, it automatically connects to monitoring, and if i disable the "Active Checks", it will not switch to monitorng.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False warning NagiosXI

Post by scottwilkerson »

If you are doing ping checks and a host is disconnected from power supply the only way for it to go to OK is if the IP is getting assigned to another computer, and you should be concerned about what is causing this
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
RockerMan
Posts: 70
Joined: Fri Nov 01, 2013 12:16 am

Re: False warning NagiosXI

Post by RockerMan »

scottwilkerson wrote:If you are doing ping checks and a host is disconnected from power supply the only way for it to go to OK is if the IP is getting assigned to another computer, and you should be concerned about what is causing this
The variant of using the address by another host was checked first. No one used the address of the disconnectedd router. In this case, we would receive a flip-flop from arpwatch, and no messages from arpwatch.
A uninterrupted ping for verification was launched on the disconnected host, yesterday, until all the packets were lost, there was no UP state.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False warning NagiosXI

Post by scottwilkerson »

RockerMan wrote:
scottwilkerson wrote:If you are doing ping checks and a host is disconnected from power supply the only way for it to go to OK is if the IP is getting assigned to another computer, and you should be concerned about what is causing this
The variant of using the address by another host was checked first. No one used the address of the disconnectedd router. In this case, we would receive a flip-flop from arpwatch, and no messages from arpwatch.
A uninterrupted ping for verification was launched on the disconnected host, yesterday, until all the packets were lost, there was no UP state.
So you are saying the state is changing to UP even though the host check isn't able to ping the host?

The only other way I see this as possible would be if you had the following in your nagios.cfg

Code: Select all

retain_state_information=0
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
RockerMan
Posts: 70
Joined: Fri Nov 01, 2013 12:16 am

Re: False warning NagiosXI

Post by RockerMan »

scottwilkerson wrote: So you are saying the state is changing to UP even though the host check isn't able to ping the host?
The host was down, but somehow Nagios saw that he was in the UP.

Code: Select all

2017-09-04 20:51:26   HOST ALERT: spb-wan-r2;UP;HARD;3;PING WARNING - Packet loss = 87%, RTA = 0.90 ms
I want to understand how Nagios could get an UP if host is down and no one took the host IP address and connected it to another host.
scottwilkerson wrote: The only other way I see this as possible would be if you had the following in your nagios.cfg

Code: Select all

retain_state_information=0
no, 1

Code: Select all

# cat /usr/local/nagios/etc/nagios.cfg | grep retain_state_information
retain_state_information=1
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False warning NagiosXI

Post by scottwilkerson »

RockerMan wrote:The host was down, but somehow Nagios saw that he was in the UP.

Code: Select all

 2017-09-04 20:51:26   HOST ALERT: spb-wan-r2;UP;HARD;3;PING WARNING - Packet loss = 87%, RTA = 0.90 ms

I want to understand how Nagios could get an UP if host is down and no one took the host IP address and connected it to another host.
Generally I would say this is not possible, but to know for sure I would need to see the host configuration and the corresponding command configuration.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
RockerMan
Posts: 70
Joined: Fri Nov 01, 2013 12:16 am

Re: False warning NagiosXI

Post by RockerMan »

Yes, I think so too.

Let's pause for now. There was one case, on 4.09.2017, after him there was no such incident. I now put the host on a uninterrupted ping, so that it was possible to check whether the response from the host was actually in the state of UP, or it is an accidentally false positive single alert.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: False warning NagiosXI

Post by scottwilkerson »

Ok, let us know if this comes up again
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked