Suspected Memory Leak

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
cmart
Posts: 1
Joined: Tue Nov 27, 2012 4:03 am

Suspected Memory Leak

Post by cmart »

Hello,

(Host spec at the bottom)

I'm supporting a production Nagios instance (v3.2.1 which i appreciate is an old version) and i'm starting to see what looks like a memory leak when i increase the maximum concurrent service checks setting in Nagios.cfg. For 18 months our instance has happily supported ~3000 service checks across ~400 hosts while running at 200 max concurrent checks. When i started to see the latency increase to 15 seconds and service checks being nudged in the log, i increased the maximum checks to 400 and since then our instance crashes every few weeks.

Looking at the memory usage for the Nagios host, i can see memory usage incrementing at 3GB every 2 weeks (seems to crash at 5GB usage) which has been occurring since the max checks has been increased. For the moment i have decreased the check limit to 200 and it's now stable.

I've looked on the Nagios Docs site (http://nagios.sourceforge.net/docs/3_0/tuning.html) and i'm unable to see any advice on the correlation between max service checks and memory usage so i'm posting here for helpful advice or suggestions. I suspect that other items in the Nagios configuration may also need amending but i'm reluctant to do so as this is a production instance and i'm not 100% certain of all the configuration options.

I'm planning to have this cloned to a VM so that i can test the upgrade process as we're overdue and it may well be that that could fix it however i thought i'd check here also to cover all grounds. I should also mention that we only get a load of 3 and there's plenty of CPU resource

Here's my current Nagios.cfg config:

Code: Select all

cfg_file=/etc/nagios3/hostTemplates.cfg
cfg_file=/etc/nagios3/hosts.cfg
cfg_file=/etc/nagios3/serviceTemplates.cfg
cfg_file=/etc/nagios3/services.cfg
cfg_file=/etc/nagios3/misccommands.cfg
cfg_file=/etc/nagios3/checkcommands.cfg
cfg_file=/etc/nagios3/contactgroups.cfg
cfg_file=/etc/nagios3/contacts.cfg
cfg_file=/etc/nagios3/hostgroups.cfg
cfg_file=/etc/nagios3/servicegroups.cfg
cfg_file=/etc/nagios3/timeperiods.cfg
cfg_file=/etc/nagios3/escalations.cfg
cfg_file=/etc/nagios3/dependencies.cfg
cfg_file=/etc/nagios3/meta_commands.cfg
cfg_file=/etc/nagios3/meta_contact.cfg
cfg_file=/etc/nagios3/meta_contactgroup.cfg
cfg_file=/etc/nagios3/meta_dependencies.cfg
cfg_file=/etc/nagios3/meta_escalations.cfg
cfg_file=/etc/nagios3/meta_host.cfg
cfg_file=/etc/nagios3/meta_hostgroup.cfg
cfg_file=/etc/nagios3/meta_services.cfg
cfg_file=/etc/nagios3/meta_timeperiod.cfg
resource_file=/etc/nagios3/resource.cfg
log_file=/var/log/nagios3/nagios.log
status_file=/var/cache/nagios3/status.dat
object_cache_file=/var/cache/nagios3/objects.cache
temp_file=/var/cache/nagios3/nagios.tmp
p1_file=/usr/lib/nagios3/p1.pl
nagios_user=nagios
nagios_group=nagios
enable_notifications=1
execute_service_checks=1
accept_passive_service_checks=1
enable_event_handlers=1
log_rotation_method=d
log_archive_path=/var/log/nagios3/archives/
check_external_commands=1
command_check_interval=1s
command_file=/var/lib/nagios3/rw/nagios.cmd
lock_file=/var/run/nagios3/nagios3.pid
retain_state_information=1
state_retention_file=/var/lib/nagios3/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
sleep_time=1
service_inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=200
max_service_check_spread=5
check_result_reaper_frequency=5
interval_length=60
enable_flap_detection=1
low_service_flap_threshold=25.0
high_service_flap_threshold=50.0
low_host_flap_threshold=25.0
high_host_flap_threshold=50.0
soft_state_dependencies=0
service_check_timeout=60
host_check_timeout=10
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
ochp_timeout=5
perfdata_timeout=5
obsess_over_services=0
process_performance_data=1
service_perfdata_command=process-service-perfdata
service_perfdata_file_mode=2
check_for_orphaned_services=0
check_service_freshness=1
date_format=euro
illegal_object_name_chars=~!$%^"&*|'<>?,()=
illegal_macro_output_chars=`~$^"&|'<>
admin_email=admin
admin_pager=admin@localhost
broker_module=/usr/lib/ndoutils/ndomod-mysql-3x.o config_file=/etc/nagios3/ndomod.cfg
event_broker_options=-1
debug_level=0
debug_verbosity=2
use_aggressive_host_checking=0
Here's my host:

Code: Select all

OS: Ubuntu Server 10.10
Dell PowerEdge R610
CPU: 2x INTEL XEON E5620 PROCESSOR 2.40GHZ Quadcore
Memory: 8GB 1066 MHZ
HDD: 2x 146GB, SAS 6GBPS, 15K Mirror RAID


Any advice is welcome
Chris
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Suspected Memory Leak

Post by agriffin »

Most Nagios Core developers do not frequent this forum. You should post this to the mailing lists and/or bug tracker.
Locked