[Nagios-devel] Passive services going stale on a Nagios restart
Posted: Mon Feb 13, 2006 7:05 am
Hi Ethan,
I'm running a distributed monitoring setup with freshness checking on
the master server for passive checks. If the master is stopped for a
long time and then restarted, the passive checks go stale at the next
freshness cycle because there is not enough time for the slaves to
send results back.
In base/checks.c, there is some code to cater for program_start, but
is only for active checks. I've removed the active_check condition
and this works for me now.
This is the patch:
--- checks.c.2.0 2006-02-13 11:57:09.181245510 +0000
+++ checks.c 2006-02-13 12:00:02.726750637 +0000
@@ -1758,7 +1758,9 @@
/* calculate expiration time */
/* CHANGED 11/10/05 EG - program start is only used
in expiration time calculation if > last check AND active checks are
enabled, so active checks can become stale immediately upon program
startup */
- if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check))
+ /* if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check)) */
+ /* Passive checks immediately go stale, so ignore the
checks_enabled setting */
+ if(temp_service->has_been_checked==FALSE ||
program_start>temp_service->last_check)
expiration_time=(time_t)(program_start
+freshness_threshold);
else
expiration_time=(time_t)(temp_service-
>last_check+freshness_threshold);
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
I'm running a distributed monitoring setup with freshness checking on
the master server for passive checks. If the master is stopped for a
long time and then restarted, the passive checks go stale at the next
freshness cycle because there is not enough time for the slaves to
send results back.
In base/checks.c, there is some code to cater for program_start, but
is only for active checks. I've removed the active_check condition
and this works for me now.
This is the patch:
--- checks.c.2.0 2006-02-13 11:57:09.181245510 +0000
+++ checks.c 2006-02-13 12:00:02.726750637 +0000
@@ -1758,7 +1758,9 @@
/* calculate expiration time */
/* CHANGED 11/10/05 EG - program start is only used
in expiration time calculation if > last check AND active checks are
enabled, so active checks can become stale immediately upon program
startup */
- if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check))
+ /* if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check)) */
+ /* Passive checks immediately go stale, so ignore the
checks_enabled setting */
+ if(temp_service->has_been_checked==FALSE ||
program_start>temp_service->last_check)
expiration_time=(time_t)(program_start
+freshness_threshold);
else
expiration_time=(time_t)(temp_service-
>last_check+freshness_threshold);
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]