[Nagios-devel] [PATCH] notifications: Fix first_notification_delay

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] [PATCH] notifications: Fix first_notification_delay

Post by Guest »

Nagios bug #297, while being light on details, points out a problem with
first_notification_delay that I think this patch clears up.

The problem is two-fold: first, the logic for finding out which state to
use is incorrect - a comment rightly points out that last_time_{ok,up}
is the wrong value to use, and then proceeds to use any value that is
both more recent and less recent than last_time_{ok,up} - this condition
is likely to fail, meaning we always use the incorrect value.

Second, the logic is impossible to correct, though, because all the
other values are incorrect, too - we always set last_time_$state
whenever a check is executed, meaning the problem last_time_$state
variable would usually be too recent.

This is fixed by redefining last_time_$state. They used to be "the last
time a check returned $state", now they're "the last time an object was
in $state". The change is subtle, and the value is usually updated the
way it was before, but when the state changes, we put the end time of
the previous state in last_time_$previous_state, rather than putting the
start time of the current state in last_time_$current_state. Nagios core
does not use these variables anywhere else, and it's unlikely to break
other peoples code, since last_state_change is a much better way to get
the start time of the current state.

With this change in place, all we need to do to find out when a problem
began is to read last_time_{ok,up}, use that to compare against
first_notification_delay, and we're done.

Signed-off-by: Robin Sonefors
---
base/checks.c | 8 ++++----
base/notifications.c | 22 +++++-----------------
2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/base/checks.c b/base/checks.c
index 37aea54..af9e6ba 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -458,8 +458,8 @@ int handle_async_service_check_result(service *temp_service, check_result *queue
}


- /* record the last state time */
- switch(temp_service->current_state) {
+ /* record the time the last state ended */
+ switch(temp_service->last_state) {
case STATE_OK:
temp_service->last_time_ok = temp_service->last_check;
break;
@@ -3236,8 +3236,8 @@ int handle_host_state(host *hst) {
/* update performance data */
update_host_performance_data(hst);

- /* record latest time for current state */
- switch(hst->current_state) {
+ /* record the time the last state ended */
+ switch(hst->last_state) {
case HOST_UP:
hst->last_time_up = current_time;
break;
diff --git a/base/notifications.c b/base/notifications.c
index 2dff2ef..226bc42 100644
--- a/base/notifications.c
+++ b/base/notifications.c
@@ -522,19 +522,12 @@ int check_service_notification_viability(service *svc, int type, int options) {

/* see if enough time has elapsed for first notification (Mathias Sundman) */
/* 10/02/07 don't place restrictions on recoveries or non-normal notifications, must use last time ok (or program start) in calculation */
- /* it is reasonable to assume that if the host was never up, the program start time should be used in this calculation */
+ /* it is reasonable to assume that if the service was never up, the program start time should be used in this calculation */
if(type == NOTIFICATION_NORMAL && svc->current_notification_number == 0 && svc->current_state != STATE_OK) {

- /* determine the time to use of the first problem point */
- first_problem_time = svc->last_time_ok; /* not accurate, but its the earliest time we could use in the comparison */
- if((svc->last_time_warning last_time_warning > svc->last_time_ok))
- first_problem_time = svc->last_time_warning;
- if((svc->last_time_unknown last_time_unknown > svc->last_time_ok))
- first_problem_time = svc->last_time_unknown;
- if((svc->last_time_critical last_time_critical > svc->last_time_ok))
- first_problem_time = svc->last_time_critical;
+ first_problem_time = svc->last_time_ok > 0 ? svc->last_time_ok : program_start;

- if(current_time < (time_t)((first_problem_time == (time_t)0L) ? p

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: robin.sonefors@op5.com
Locked