Something I found strange is that after restarts, the service check latency average goes through the roof (15 minutes). It makes sense that nagios is doing it's best to schedule things in a local and remote host friendly way. However, why is it 'rescheduling' checks like they never have been run before the restart, while we try to store as much state as possible:
Code: Select all
# Checks age management
check_result_path=/srv/nagios/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
# AUTO RESCHEDULING OPTIONS (EXPERIMENTAL, use with CAUTION)
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
# SOME TIMINGS
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
# Performance tuning
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
retain_state_information=1
state_retention_file=/var/log/nagios/retention.dat
retention_update_interval=1
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
# Status file
status_file=/var/log/nagios/status.dat
status_update_interval=120
Anything I'm missing here or can nagios3 simply not avoid this behavior?
Here's some more information from a nagios -s:
Code: Select all
Timing information on object configuration processing is listed
below. You can use this information to see if precaching your
object configuration would be useful.
Object Config Source: Config files (uncached)
OBJECT CONFIG PROCESSING TIMES (* = Potential for precache savings with -u option)
----------------------------------
Read: 1.100689 sec
Resolve: 0.196292 sec *
Recomb Contactgroups: 0.031873 sec *
Recomb Hostgroups: 0.172127 sec *
Dup Services: 0.163431 sec *
Recomb Servicegroups: 0.009459 sec *
Duplicate: 0.210780 sec *
Inherit: 0.036253 sec *
Recomb Contacts: 0.000000 sec *
Sort: 0.000000 sec *
Register: 0.362647 sec
Free: 0.063051 sec
============
TOTAL: 2.346605 sec * = 0.820218 sec (34.95%) estimated savings
RETENTION DATA TIMES
----------------------------------
Read and Process: 4.833108 sec
============
TOTAL: 4.833108 sec
Timing information on configuration verification is listed below.
CONFIG VERIFICATION TIMES (* = Potential for speedup with -x option)
----------------------------------
Object Relationships: 0.356153 sec
Circular Paths: 0.000000 sec *
Misc: 0.045195 sec
============
TOTAL: 0.401348 sec * = 0.000000 sec (0.0%) estimated savings
EVENT SCHEDULING TIMES
-------------------------------------
Get service info: 0.278038 sec
Get host info info: 0.000542 sec
Get service params: 0.000002 sec
Schedule service times: 0.261014 sec
Schedule service events: 46.465771 sec
Get host params: 0.000000 sec
Schedule host times: 0.008024 sec
Schedule host events: 9.417014 sec
============
TOTAL: 56.430405 sec
Projected scheduling information for host and service checks
is listed below. This information assumes that you are going
to start running Nagios with your current config files.
HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 5368
Total scheduled hosts: 5367
Host inter-check delay method: SMART
Average host check interval: 300.00 sec
Host inter-check delay: 0.06 sec
Max host check spread: 30 min
First scheduled check: Fri Mar 24 11:08:52 2017
Last scheduled check: Fri Mar 24 11:08:52 2017
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 126026
Total scheduled services: 87914
Service inter-check delay method: SMART
Average service check interval: 1102.51 sec
Inter-check delay: 0.01 sec
Interleave factor method: SMART
Average services per host: 23.48
Service interleave factor: 17
Max service check spread: 30 min
First scheduled check: Fri Mar 24 11:09:56 2017
Last scheduled check: Fri Mar 24 11:27:26 2017
CHECK PROCESSING INFORMATION
----------------------------
Check result reaper interval: 20 sec
Max concurrent service checks: Unlimited