Page 1 of 1

[Nagios-devel] BUG: Passive host check results are always in HARD

Posted: Tue Jul 04, 2006 7:45 am
by Guest
This is a multipart message in MIME format.
--=_related 00568324C12571A1_=
Content-Type: multipart/alternative;
boundary="=_alternative 00568324C12571A1_="


--=_alternative 00568324C12571A1_=
Content-Type: text/plain; charset="us-ascii"

Hi,

We have a distributed Nagios set-up with three (slave) check engines
performing active checks and sending their results to a master server
which collects all results and sends out alarms if need be.

Our department had a lot of complaints regarding remote hosts connected
over a WAN link that give out a lot of false positives.

Because WAN links are more prone to packet loss than LAN links, we've set
the number of host retries to 10, figuring that this would avoid any false
alerts about hosts being down while in fact it is just a temporary glitch
in the line.

This setup did not work however. Further investigation about the cause
revealed what I believe to be a bug.

While receiving host check results in PASSIVE mode, the number of retries
is not taken into account and a negative response will immediately results
in a HARD state, which in turn sends out alerts.

This is a very annoying bug because it can create a lot of unnecessary
notifications if you're monitoring a machine over a WAN link.

I've first experienced this bug while running nagios 2.2 and have recently
upgraded to 2.4 to no avail.

In our normal setup, a slave machine would perform an active host check
and send the result through nsca to the master server. But it is not
necessary to reproduce the buggy behaviour. You can easily do it as
follows:

1) Pick a machine in your nagios configuration that you can play with.

As you can see from the first screenshot, the machine is currently in
attempt 1/10, state type HARD and last result was passive:



2) Click on "Submit passive check result for this host"



3) Commit and wait a minute:



As can be seen, the passive check immediately results in a HARD state,
even though the attempt is only 1/10.

Note that PASSIVE services checks work as expected, it's only host checks
that exhibit this behaviour.

Would it be possible to post a patch for this bug or could a fix be
incoporated in a next release?

Best Regards,

Jan David

--=_alternative 00568324C12571A1_=
Content-Type: text/html; charset="us-ascii"


Hi,

We have a distributed Nagios set-up with three (slave) check engines performing active checks and sending their results to a master server which collects all results and sends out alarms if need be.

Our department had a lot of complaints regarding remote hosts connected over a WAN link that give out a lot of false positives.

Because  WAN links are more prone to packet loss than LAN links, we've set the number of host retries to 10, figuring that this would avoid any false alerts about hosts being down while in fact it is just a temporary glitch in the line.

This setup did not work however. Further investigation about the cause revealed what I believe to be a bug.

While receiving host check results in PASSIVE mode, the number of retries is not taken into account and a negative response will immediately results in a HARD state, which in turn sends out alerts.

This is a very annoying bug because it can create a lot of unnecessary notifications if you're monitoring a machine over a WAN link.

I've first experienced this bug while running nagios 2.2 and have recently upgraded to 2.4 to no avail.

In our normal setup, a slave machine would perform an active host check and send the result through nsca to the master server. But it is not necessary to reproduce the buggy behaviour. You can easily do it as follows:

<font size=2 face="sans-seri

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]