Problem with freshness thresholds in 4.1.1

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Problem with freshness thresholds in 4.1.1

Post by rejas »

I'm experiencing a lot of false alarms from passive checks. I have raised the threshold to a much longer time than needed to minimize the false alarms, but still a lot is coming through. Looking at the log there seems to be an apparent bug since the log says a service is stale by 16829d 6h 38m 22s (threshold=0d 0h 21m 40s). This seems to be since the epoch, but the service was recently checked.

Is this a known bug? Any known workarounds?

Regards,

Marcus

Code: Select all

[1454050801] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454050801] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454050802] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454050802] PASSIVE SERVICE CHECK: thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454050802] Warning: The results of service 'Apt' on host 'srv-app03-test101' are stale by 16829d 6h 38m 22s (threshold=0d 0h 21m 40s).  I'm forcing an immediate check of the service.
[1454050802] SERVICE ALERT: srv-app03-test101;Apt;WARNING;HARD;1;WARNING: Missing report. This does not necessarily indicate an error. 
[1454050802] SERVICE NOTIFICATION: kajsa;srv-app03-test101;Apt;WARNING;notify-by-email;WARNING: Missing report. This does not necessarily indicate an error. 
[1454050802] SERVICE NOTIFICATION: kalle;srv-app03-test101;Apt;WARNING;notify-by-email;WARNING: Missing report. This does not necessarily indicate an error. 
[1454051102] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051102] PASSIVE SERVICE CHECK: thc-cyg-hcapp-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051102] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE ALERT: srv-app03-test101;Apt;OK;HARD;1;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE NOTIFICATION: kajsa;srv-app03-test101;Apt;OK;notify-by-email;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051102] SERVICE NOTIFICATION: kalle;srv-app03-test101;Apt;OK;notify-by-email;APT OK: 0 packages available for upgrade (0 critical updates).
[1454051402] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates). |available_upgrades=0;;;0 critical_updates=0;;;0
[1454051402] PASSIVE SERVICE CHECK: srv-app03-test101;Apt;0;APT OK: 0 packages available for upgrade (0 critical updates).                         
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Problem with freshness thresholds in 4.1.1

Post by hsmith »

Is the time on your system off by chance?

Code: Select all

date
Former Nagios Employee.
me.
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Re: Problem with freshness thresholds in 4.1.1

Post by rejas »

[quote="hsmith"]Is the time on your system off by chance?

No, I don't think so. The time is correct. However this is on a virtual server. I'll see if I can investigate this lead in more depth. Thanks for the input!

Marcus
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Problem with freshness thresholds in 4.1.1

Post by hsmith »

No problem! Let me know what you come up with.
Former Nagios Employee.
me.
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Re: Problem with freshness thresholds in 4.1.1

Post by rejas »

I can't seem to find any issues with the time. I added some debugging and came up with this.

Code: Select all

[1454505675] Warning: The results of service 'CPU' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Disk' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Memory' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454505675] Warning: The results of service 'Processes' on host 'all-xxx1' are stale by 0d 0h 4m 1s (threshold=0d 0h 6m 0s) (current_time=1454505675, expiration_time=1454505434).  I'm forcing an immediate check of the service.
[1454506125] Warning: The results of service 'Swap usage' on host 'hms-xxx1' are stale by 16834d 9h 9m 40s (threshold=0d 0h 16m 40s) (current_time=1454506125, expiration_time=15545).  I'm forcing an immediate check of the service.
There seems to be an issue with the expiration_time now and then. I'm a little stuck here ....
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Problem with freshness thresholds in 4.1.1

Post by tgriep »

Can you post the configuration for one of the service checks that is having the issue so we can review it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Re: Problem with freshness thresholds in 4.1.1

Post by rejas »

I did not find a solution to this. I do not have the time to investigate it further. Did an ugly workaround by adding the following to the is_service_result_fresh and the corresponding in is_host_result_fresh

Code: Select all

                /* Added by MR. Just check for insanely small expiration times */
                if (expiration_time < 1400000000) {

                        logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: The results of service '%s' on host '%s' are stale by %dd %dh %dm %ds (threshold=%dd %dh %dm %ds) (current_time=%d, expiration_time=%d) but it's too much. Letting it pass.\n", temp_service->description, temp_service->host_name, days, hours, minutes, seconds, tdays, thours, tminutes, tseconds, (int)current_time, (int)expiration_time);

                log_debug_info(DEBUGL_CHECKS, 1, "Check results for service '%s' on host '%s' are stale by %dd %dh %dm %ds (threshold=%dd %dh %dm %ds) but it's too much. Letting it pass.\n", temp_service->description, temp_service->host_name, days, hours, minutes, seconds, tdays, thours, tminutes, tseconds);

                        return TRUE;
                }
It is now working as I expect it. I'll put on my todo list to really find the source of the problem.

/Marcus
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Re: Problem with freshness thresholds in 4.1.1

Post by rejas »

tgriep wrote:Can you post the configuration for one of the service checks that is having the issue so we can review it?

Code: Select all

define service {
        name                            xxxxxx-service
        use                             generic-service
        register                        0
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           0
        notification_period             24x7
        notification_options            w,u,c,r
        notes_url http://xxxxxxxxxxxxxxxxxxxxx/$SERVICEDESC$
        _PROBLEM_PROCESS                Saknas
        _PROBLEM_PRIORITY               Saknas
}


define service {
        host_name       all-xxxxxxxx1
        service_description     CPU
        use                             xxxxxx-service
        check_command                   check_passive_missing
        check_freshness                 1
        freshness_threshold             660
        active_checks_enabled           0
        passive_checks_enabled          1
}

tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Problem with freshness thresholds in 4.1.1

Post by tmcdonald »

If you want, you can post this to our Github so our Core dev can take a look at it: https://github.com/NagiosEnterprises/nagioscore/issues

That'll get more traction on our end.
Former Nagios employee
rejas
Posts: 6
Joined: Wed Nov 02, 2011 5:05 am

Re: Problem with freshness thresholds in 4.1.1

Post by rejas »

tmcdonald wrote:If you want, you can post this to our Github so our Core dev can take a look at it: https://github.com/NagiosEnterprises/nagioscore/issues

That'll get more traction on our end.
Thanks!

Currently under a very hard workload. This works for now with my ugly patch. Look into it later.

/Marcus
Locked