Re: [Nagios-devel] Passive host down result is interpreted as up on

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Passive host down result is interpreted as up on

Post by Guest »


--Apple-Mail-17-590863139
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

Hi!

On 16 Mar 2007, at 18:02, Ton Voon wrote:

> I was wondering if anyone has seen this before. On a slave, we have
> a host that is marked as DOWN with a plugin output of "CRITICAL -
> Plugin timed out after 10 seconds", as expected. However, on the
> master, that host is marked as UP with the same text.
>
>
> The logs on the master server, show:
>
> [1174045717] EXTERNAL COMMAND:
> PROCESS_HOST_CHECK_RESULT;host1;0;PING OK - Packet loss = 0%, RTA =
> 0.37 ms|
>
> Host is marked as UP. Later on:
>
> [1174045949] EXTERNAL COMMAND:
> PROCESS_HOST_CHECK_RESULT;host1;1;CRITICAL - Plugin timed out after
> 10 seconds|
>
> Failure arrives.
>
> [1174045949] HOST ALERT: host1;DOWN;HARD;1;CRITICAL - Plugin timed
> out after 10 seconds
>
> Marked it as DOWN with alert. As expected.
>
> [1174045951] Warning: The results of service '/ - partition' on
> host 'host1' are stale by 24 seconds (threshold=82 seconds). I'm
> forcing an immediate check of the service.
> [1174045953] SERVICE ALERT: host1;/ - partition;UNKNOWN;HARD;
> 1;UNKNOWN: Service results are stale
> [1174045959] EXTERNAL COMMAND:
> PROCESS_HOST_CHECK_RESULT;host1;1;CRITICAL - Plugin timed out after
> 10 seconds|
>
> More passive results
>
> [1174045971] EXTERNAL COMMAND:
> PROCESS_HOST_CHECK_RESULT;host1;1;CRITICAL - Plugin timed out after
> 10 seconds|
>
> And again, but this time...
>
> [1174045973] HOST ALERT: host1;UP;HARD;1;CRITICAL - Plugin timed
> out after 10 seconds
>
> Nagios has marked the host as UP, even though the
> PROCESS_HOST_CHECK_RESULT is down.
>
>
> The complete nagios.log around this period is attached. I'm at a
> lost understanding why this has happened. Has anyone got any clues,
> or seen something similar?
>
> We haven't been able to reproduce this consistently yet.
>
> This is on Nagios 2.5 (with some local patches).


We think we've found the root problem.

In checks.c, if a host does not have a check_command, there is a
debug line that says: "No host check command specified, so no check
will be done (host state assumed to be unchanged)". However, it then
returns HOST_UP. We have amended this to return hst->current_state
instead.

In our distributed setup, we define a host without a check_command,
instead relying on the passive host results sent by the slave.
However, on the master, if a service on this host passes its
freshness threshold, a host check is scheduled, with the force flag.
This then gets to this portion of the code and returns a HOST_UP
state rather than the current state, thus showing an incorrect state
for the host.

Our patch is below, made against nagios 2.8.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon


--Apple-Mail-17-590863139
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream; x-unix-mode=0644;
name=nagios_no_host_check_command_returns_current_state.patch
Content-Disposition: attachment;
filename=nagios_no_host_check_command_returns_current_state.patch

diff -ur nagios-2.8.original/base/checks.c nagios-2.8/base/checks.c
--- nagios-2.8.original/base/checks.c 2007-03-19 15:16:38.375621511 +0000
+++ nagios-2.8/base/checks.c 2007-03-19 15:19:31.983526254 +0000
@@ -2427,7 +2427,9 @@
printf("\tNo host check command specified, so no check will be done (host state assumed to be unchanged)!\n");
#endif

- return HOST_UP;
+ /* Altinity patch: This should return the current state, rather than assume server is up. Incorrect in a distributed setup */
+ /* return HOST_UP; */
+ return hst->current_state;
}

/* grab the host macros */

--Apple-Mail-17-590863139
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed



--Apple-Mail-17-590863139--





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked