I'm refreshing our Nagios environment (hardware and OS upgrade) and, during my testing, I happened upon an event that occurs approximately after ~33 hours of daemon uptime in which both the host and service latencies begin to increase at a rate of approximately 140 seconds / day (see graphs). Although not on the graphs, the host latencies follow the same trends as the service latencies.

I'm out of ideas and could use some help on this topic if anybody has any pointers they would care to offer. Hardware, OS, nagios.cfg details are below.
Thank you,
David Nelson
Code: Select all
OS: RHEL6 x86_64
Kernels: 2.6.32-279.14.1.el6.x86_64 and 2.6.39-200.24.1.el6uek.x86_64
Hardware: PowerEdge R720 (2 X Six-Core Hyper-Threaded Intel(R) Xeon(R) E5-2640 0 @ 2.50GHz) w/ 16 GB RAM
Filesystems: ext4 and tmpfs
Nagios Cores: 3.3.1, 3.4.4, 3.5.1; all compiled from source
Total Services: 15222 (this will increase to ~18,000 at go-live)
Total Hosts: 908 (this will increase to ~1200 at go-live)Code: Select all
Current nagios.cfg is:
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=1
cached_host_check_horizon=30
cached_service_check_horizon=30
cfg_dir=/opt/apps/nagios/etc/groups/unix/etc
cfg_dir=/opt/apps/nagios/etc/groups/unix/hosts
cfg_dir=/opt/apps/nagios/etc/groups/web/etc
cfg_dir=/opt/apps/nagios/etc/objects
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=1
check_result_path=/dev/shm
check_result_reaper_frequency=5
check_service_freshness=1
child_processes_fork_twice=1
command_check_interval=-1
command_file=/opt/apps/nagios-private/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/opt/apps/nagios/var/nagios.debug
debug_level=31
debug_verbosity=2
enable_embedded_perl=1
enable_environment_macros=1
enable_event_handlers=0
enable_flap_detection=1
enable_notifications=0
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=0
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/dev/shm/nagios.lock
log_archive_path=/opt/apps/nagios-private/var/archives
log_event_handlers=1
log_external_commands=1
log_file=/opt/apps/nagios-private/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=1
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=15.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/dev/shm/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
p1_file=/opt/apps/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/dev/shm/objects.precache
process_performance_data=0
resource_file=/opt/apps/nagios/etc/resource.cfg
retain_state_information=1
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retention_update_interval=5
service_check_timeout=30
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/opt/apps/nagios/var/retention.dat
status_file=/dev/shm/status.dat
status_update_interval=10
temp_file=/dev/shm/nagios.tmp
temp_path=/opt/apps/nagios/tmp
translate_passive_host_checks=0
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_large_installation_tweaks=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=0
use_true_regexp_matching=0


