Huge number of active services are stale
Posted: Mon Mar 18, 2019 5:36 am
Hi guys,
A huge number of services defined as active checks are stales.
...
[Mon Mar 18 11:23:08 2019] Warning: The results of service 'Running Processes' on host '1287000114_10_cucon-01' are stale by 0d 0h 0m 10s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
[Mon Mar 18 11:23:08 2019] Warning: The results of service 'Running Processes' on host '1287000114_10_imp-01' are stale by 0d 0h 0m 6s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
[Mon Mar 18 11:24:08 2019] Warning: The results of service 'Device Power Supply' on host '1287000012_00_ccme-01' are stale by 0d 0h 0m 1s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
...
I was expecting these behaviors only on passive checks.
Our monitoring engine status seems ok as you can see below:
#############################################################
[root@nagios-01: /usr/local/nagios]# /usr/local/nagios/bin/nagiostats
Nagios Stats 4.4.3
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 2019-01-15
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 6s
Status File Version: 4.4.3
Program Running Time: 1d 10h 25m 39s
Nagios PID: 29706
Total Services: 16832
Services Checked: 16832
Services Scheduled: 15491
Services Actively Checked: 16771
Services Passively Checked: 61
Total Service State Change: 0.000 / 28.680 / 0.017 %
Active Service Latency: 0.000 / 1.076 / 0.074 sec
Active Service Execution Time: 0.001 / 60.006 / 1.006 sec
Active Service State Change: 0.000 / 28.680 / 0.014 %
Active Services Last 1/5/15/60 min: 1153 / 7478 / 15148 / 16687
Passive Service Latency: 0.006 / 0.993 / 0.539 sec
Passive Service State Change: 0.000 / 25.200 / 0.677 %
Passive Services Last 1/5/15/60 min: 1 / 1 / 1 / 2
Services Ok/Warn/Unk/Crit: 16481 / 80 / 255 / 16
Services Flapping: 0
Services In Downtime: 0
Total Hosts: 2717
Hosts Checked: 2715
Hosts Scheduled: 2717
Hosts Actively Checked: 2717
Host Passively Checked: 0
Total Host State Change: 0.000 / 10.000 / 0.027 %
Active Host Latency: 0.000 / 1.057 / 0.020 sec
Active Host Execution Time: 0.000 / 5.003 / 0.164 sec
Active Host State Change: 0.000 / 10.000 / 0.027 %
Active Hosts Last 1/5/15/60 min: 688 / 2647 / 2715 / 2715
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 2677 / 23 / 17
Hosts Flapping: 0
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 746 / 2786 / 8352
Scheduled: 739 / 2746 / 8212
On-demand: 7 / 40 / 140
Parallel: 739 / 2746 / 8212
Serial: 0 / 0 / 0
Cached: 7 / 40 / 140
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 1377 / 7644 / 22825
Scheduled: 1377 / 7644 / 22825
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 1 / 1 / 1
External Commands Last 1/5/15 min: 2 / 2 / 2
############################################################################
any idea?
We are running Nagios XI 5.5.10
B.Regards
A huge number of services defined as active checks are stales.
...
[Mon Mar 18 11:23:08 2019] Warning: The results of service 'Running Processes' on host '1287000114_10_cucon-01' are stale by 0d 0h 0m 10s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
[Mon Mar 18 11:23:08 2019] Warning: The results of service 'Running Processes' on host '1287000114_10_imp-01' are stale by 0d 0h 0m 6s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
[Mon Mar 18 11:24:08 2019] Warning: The results of service 'Device Power Supply' on host '1287000012_00_ccme-01' are stale by 0d 0h 0m 1s (threshold=0d 0h 4m 15s). I'm forcing an immediate check of the service.
...
I was expecting these behaviors only on passive checks.
Our monitoring engine status seems ok as you can see below:
#############################################################
[root@nagios-01: /usr/local/nagios]# /usr/local/nagios/bin/nagiostats
Nagios Stats 4.4.3
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 2019-01-15
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 6s
Status File Version: 4.4.3
Program Running Time: 1d 10h 25m 39s
Nagios PID: 29706
Total Services: 16832
Services Checked: 16832
Services Scheduled: 15491
Services Actively Checked: 16771
Services Passively Checked: 61
Total Service State Change: 0.000 / 28.680 / 0.017 %
Active Service Latency: 0.000 / 1.076 / 0.074 sec
Active Service Execution Time: 0.001 / 60.006 / 1.006 sec
Active Service State Change: 0.000 / 28.680 / 0.014 %
Active Services Last 1/5/15/60 min: 1153 / 7478 / 15148 / 16687
Passive Service Latency: 0.006 / 0.993 / 0.539 sec
Passive Service State Change: 0.000 / 25.200 / 0.677 %
Passive Services Last 1/5/15/60 min: 1 / 1 / 1 / 2
Services Ok/Warn/Unk/Crit: 16481 / 80 / 255 / 16
Services Flapping: 0
Services In Downtime: 0
Total Hosts: 2717
Hosts Checked: 2715
Hosts Scheduled: 2717
Hosts Actively Checked: 2717
Host Passively Checked: 0
Total Host State Change: 0.000 / 10.000 / 0.027 %
Active Host Latency: 0.000 / 1.057 / 0.020 sec
Active Host Execution Time: 0.000 / 5.003 / 0.164 sec
Active Host State Change: 0.000 / 10.000 / 0.027 %
Active Hosts Last 1/5/15/60 min: 688 / 2647 / 2715 / 2715
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 2677 / 23 / 17
Hosts Flapping: 0
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 746 / 2786 / 8352
Scheduled: 739 / 2746 / 8212
On-demand: 7 / 40 / 140
Parallel: 739 / 2746 / 8212
Serial: 0 / 0 / 0
Cached: 7 / 40 / 140
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 1377 / 7644 / 22825
Scheduled: 1377 / 7644 / 22825
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 1 / 1 / 1
External Commands Last 1/5/15 min: 2 / 2 / 2
############################################################################
any idea?
We are running Nagios XI 5.5.10
B.Regards