Page 1 of 2

Problem with freshness thresholds in 4.1.1

Posted: Fri Jan 29, 2016 5:05 am
by rejas
I'm experiencing a lot of false alarms from passive checks. I have raised the threshold to a much longer time than needed to minimize the false alarms, but still a lot is coming through. Looking at the log there seems to be an apparent bug since the log says a service is stale by 16829d 6h 38m 22s (threshold=0d 0h 21m 40s). This seems to be since the epoch, but the service was recently checked.

Is this a known bug? Any known workarounds?

Regards,

Marcus

Code: Select all

[1454050801] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454050801] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454050802] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454050802] PASSIVE SERVICE CHECK: thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454050802] Warning: The results of service 'Apt' on host 'srv-app03-test101' are stale by 16829d 6h 38m 22s (threshold=0d 0h 21m 40s).  I'm forcing an immediate check of the service.
[1454050802] SERVICE ALERT: srv-app03-test101;Apt;WARNING;HARD;1;WARNING: Missing report. This does not necessarily indicate an error. 
[1454050802] SERVICE NOTIFICATION: kajsa;srv-app03-test101;Apt;WARNING;notify-by-email;WARNING: Missing report. This does not necessarily indicate an error. 
[1454050802] SERVICE NOTIFICATION: kalle;srv-app03-test101;Apt;WARNING;notify-by-email;WARNING: Missing report. This does not necessarily indicate an error. 
[1454051102] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051102] PASSIVE SERVICE CHECK: thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051102] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE ALERT: srv-app03-test101;Apt;OK;HARD;1;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE NOTIFICATION: kajsa;srv-app03-test101;Apt;OK;notify-by-email;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE NOTIFICATION: kalle;srv-app03-test101;Apt;OK;notify-by-email;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051402] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051402] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).                         

Re: Problem with freshness thresholds in 4.1.1

Posted: Fri Jan 29, 2016 2:32 pm
by hsmith
Is the time on your system off by chance?

Code: Select all

date

Re: Problem with freshness thresholds in 4.1.1

Posted: Sat Jan 30, 2016 11:13 am
by rejas
[quote="hsmith"]Is the time on your system off by chance?

No, I don't think so. The time is correct. However this is on a virtual server. I'll see if I can investigate this lead in more depth. Thanks for the input!

Marcus

Re: Problem with freshness thresholds in 4.1.1

Posted: Mon Feb 01, 2016 11:07 am
by hsmith
No problem! Let me know what you come up with.

Re: Problem with freshness thresholds in 4.1.1

Posted: Wed Feb 03, 2016 8:36 am
by rejas
I can't seem to find any issues with the time. I added some debugging and came up with this.

Code: Select all

[1454505675] Warning: The results of service 'CPU' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Disk' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Memory' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Processes' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454506125] Warning: The results of service 'Swap usage' on host 'hms-xxx1' are stale by 16834d 9h 9m 40s (threshold=0d 0h 16m 40s) (current_time=1454506125, expiration_time=15545).  I'm forcing an immediate check of the service.
There seems to be an issue with the expiration_time now and then. I'm a little stuck here ....

Re: Problem with freshness thresholds in 4.1.1

Posted: Wed Feb 03, 2016 5:53 pm
by tgriep
Can you post the configuration for one of the service checks that is having the issue so we can review it?

Re: Problem with freshness thresholds in 4.1.1

Posted: Thu Feb 04, 2016 2:24 am
by rejas
I did not find a solution to this. I do not have the time to investigate it further. Did an ugly workaround by adding the following to the is_service_result_fresh and the corresponding in is_host_result_fresh

Code: Select all

                /* Added by MR. Just check for insanely small expiration times */
                if (expiration_time < 1400000000) {

                        logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: The results of service '%s' on host '%s' are stale by %dd %dh %dm %ds (threshold=%dd %dh %dm %ds) (current_time=%d, expiration_time=%d) but it's too much. Letting it pass.\n", temp_service->description, temp_service->host_name, days, hours, minutes, seconds, tdays, thours, tminutes, tseconds, (int)current_time, (int)expiration_time);

                log_debug_info(DEBUGL_CHECKS, 1, "Check results for service '%s' on host '%s' are stale by %dd %dh %dm %ds (threshold=%dd %dh %dm %ds) but it's too much. Letting it pass.\n", temp_service->description, temp_service->host_name, days, hours, minutes, seconds, tdays, thours, tminutes, tseconds);

                        return TRUE;
                }
It is now working as I expect it. I'll put on my todo list to really find the source of the problem.

/Marcus

Re: Problem with freshness thresholds in 4.1.1

Posted: Thu Feb 04, 2016 2:28 am
by rejas
tgriep wrote:Can you post the configuration for one of the service checks that is having the issue so we can review it?

Code: Select all

define service {
        name                            xxxxxx-service
        use                             generic-service
        register                        0
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           0
        notification_period             24x7
        notification_options            w,u,c,r
        notes_url http://xxxxxxxxxxxxxxxxxxxxx/$SERVICEDESC$
        _PROBLEM_PROCESS                Saknas
        _PROBLEM_PRIORITY               Saknas
}


define service {
        host_name       all-xxxxxxxx1
        service_description     CPU
        use                             xxxxxx-service
        check_command                   check_passive_missing
        check_freshness                 1
        freshness_threshold             660
        active_checks_enabled           0
        passive_checks_enabled          1
}


Re: Problem with freshness thresholds in 4.1.1

Posted: Thu Feb 04, 2016 4:29 pm
by tmcdonald
If you want, you can post this to our Github so our Core dev can take a look at it: https://github.com/NagiosEnterprises/nagioscore/issues

That'll get more traction on our end.

Re: Problem with freshness thresholds in 4.1.1

Posted: Fri Feb 05, 2016 4:48 am
by rejas
tmcdonald wrote:If you want, you can post this to our Github so our Core dev can take a look at it: https://github.com/NagiosEnterprises/nagioscore/issues

That'll get more traction on our end.
Thanks!

Currently under a very hard workload. This works for now with my ugly patch. Look into it later.

/Marcus