[Nagios-devel] Memory leak
Posted: Mon May 16, 2005 9:35 am
Hi,
first, excuse the crosspost - I asked at the user list some time ago,
but without a useful result.
Now, I've got a problem with Nagios 2.0b3 (with b2 as well, but I'm
trying b3 at the moment).
I noticed that the amount of used memory rises without end when Nagios
runs. I was able to find out the following:
- If I've got only one host, one service memory usage stays constant
- As soon as I add a second service, it goes up.
- The more services or hosts, or the higher the check frequency, the
faster the memory usage rises.
- No tool I know (like top or ps) can tell me where the memory goes (or
rather, which process it's used by).
- The memory usage does not go down as soon as I kill the Nagios
process, it can take between some hours and the next reboot. If I start
a process that requests more meory than physically available, i.e. I
force the system to swap, it gets freed.
- If I simply let the system run, the kernel out-of-memory reaper starts
killing processes, though.
I've got the following system:
> elf:~ # uname -a
> Linux elf 2.6.8-24.14-default #1 Tue Mar 29 09:27:43 UTC 2005 i686 athlon i386 GNU/Linux
> elf:/usr/local/nagios # bin/nagios etc/nagios-mini.cfg
>
> Nagios 2.0b3
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
> Last Modified: 04-03-2005
> License: GPL
>
> Nagios 2.0b3 starting... (PID=20489)
> elf:~ # ldd /usr/local/nagios/bin/nagios
> linux-gate.so.1 => (0xffffe000)
> libm.so.6 => /lib/tls/libm.so.6 (0x4002e000)
> libnsl.so.1 => /lib/libnsl.so.1 (0x40051000)
> libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40068000)
> libltdl.so.3 => /usr/lib/libltdl.so.3 (0x4007a000)
> libc.so.6 => /lib/tls/libc.so.6 (0x40081000)
> /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> libdl.so.2 => /lib/libdl.so.2 (0x40197000)
so, no embedded Perl, I guess.
The system is a 500MHz athlon, 512 MB RAM, IDE disk which serves as an
all-purpose-server and works fine, so I'm quite sure the OS and the
hardware are more or less ok.
You find the configuration I use for testing below.
Now, I assume there is some sort of memory leak, either in Nagios itself
or in the kernel.
I don't think it's the plugins - first, I tried several, also some
simply shell script like 'echo OK; exit 0' and I verified them using
valgrind.
Using valgrind, I do get lots of output - unfortunately, I'm not a
programmer, so it is more or less impossible for me to understand that.
Seeing that Nagios is a very useful project and my good experiences with
version 1.x I'd reallylike to be able to upgrade to version 2, as well
as help getting it running on a wider range of systems.
Now, I assume that usually version 2.0b runs ok, because I see no other
problem reports. I'm wondering if anyone can give me some advice how to
solve these problems.
Of course, I can supply log files etc. or do test runs with different
configurations.
Arno
----------
Here's my current configuration:
> elf:~ # cat /usr/local/nagios/etc/nagios-mini.cfg
> log_file=/usr/local/nagios/var/nagios.log
> cfg_file=/usr/local/nagios/etc/mini.cfg
> object_cache_file=/usr/local/nagios/var/objects.cache
> resource_file=/usr/local/nagios/etc/resource.cfg
> status_file=/usr/local/nagios/var/status.dat
> nagios_user=nagios
> nagios_group=nagios
> command_check_interval=30s
> command_file=/usr/local/nagios/var/rw/nagios-test.cmd
> comment_file=/usr/local/nagios/var/comments.dat
> downtime_file=/usr/local/nagios/var/downtime.dat
> lock_file=/usr/local/nagios/var/nagios.lock
> temp_file=/usr/local/nagios/var/nagios.tmp
> log_rotation_method=d
> log_archive_path=/usr/local/nagios/var/archives
> use_syslog=0
> log_notifications=1
> log_service_retries=1
> log_host_retries=1
> log_event_handlers=1
> log_initial_states=1
> log_external_commands=1
> log_passive_checks=1
> service_inter_check_delay_method=s
> max_service_check_spread=60
> service_interleave_factor=s
> host_inter_check_delay_method=s
> max_host_check_spread=60
> max_concurrent_checks=20
> service_reaper_frequency=2
> service_check_timeout=30
> host_
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
first, excuse the crosspost - I asked at the user list some time ago,
but without a useful result.
Now, I've got a problem with Nagios 2.0b3 (with b2 as well, but I'm
trying b3 at the moment).
I noticed that the amount of used memory rises without end when Nagios
runs. I was able to find out the following:
- If I've got only one host, one service memory usage stays constant
- As soon as I add a second service, it goes up.
- The more services or hosts, or the higher the check frequency, the
faster the memory usage rises.
- No tool I know (like top or ps) can tell me where the memory goes (or
rather, which process it's used by).
- The memory usage does not go down as soon as I kill the Nagios
process, it can take between some hours and the next reboot. If I start
a process that requests more meory than physically available, i.e. I
force the system to swap, it gets freed.
- If I simply let the system run, the kernel out-of-memory reaper starts
killing processes, though.
I've got the following system:
> elf:~ # uname -a
> Linux elf 2.6.8-24.14-default #1 Tue Mar 29 09:27:43 UTC 2005 i686 athlon i386 GNU/Linux
> elf:/usr/local/nagios # bin/nagios etc/nagios-mini.cfg
>
> Nagios 2.0b3
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
> Last Modified: 04-03-2005
> License: GPL
>
> Nagios 2.0b3 starting... (PID=20489)
> elf:~ # ldd /usr/local/nagios/bin/nagios
> linux-gate.so.1 => (0xffffe000)
> libm.so.6 => /lib/tls/libm.so.6 (0x4002e000)
> libnsl.so.1 => /lib/libnsl.so.1 (0x40051000)
> libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40068000)
> libltdl.so.3 => /usr/lib/libltdl.so.3 (0x4007a000)
> libc.so.6 => /lib/tls/libc.so.6 (0x40081000)
> /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
> libdl.so.2 => /lib/libdl.so.2 (0x40197000)
so, no embedded Perl, I guess.
The system is a 500MHz athlon, 512 MB RAM, IDE disk which serves as an
all-purpose-server and works fine, so I'm quite sure the OS and the
hardware are more or less ok.
You find the configuration I use for testing below.
Now, I assume there is some sort of memory leak, either in Nagios itself
or in the kernel.
I don't think it's the plugins - first, I tried several, also some
simply shell script like 'echo OK; exit 0' and I verified them using
valgrind.
Using valgrind, I do get lots of output - unfortunately, I'm not a
programmer, so it is more or less impossible for me to understand that.
Seeing that Nagios is a very useful project and my good experiences with
version 1.x I'd reallylike to be able to upgrade to version 2, as well
as help getting it running on a wider range of systems.
Now, I assume that usually version 2.0b runs ok, because I see no other
problem reports. I'm wondering if anyone can give me some advice how to
solve these problems.
Of course, I can supply log files etc. or do test runs with different
configurations.
Arno
----------
Here's my current configuration:
> elf:~ # cat /usr/local/nagios/etc/nagios-mini.cfg
> log_file=/usr/local/nagios/var/nagios.log
> cfg_file=/usr/local/nagios/etc/mini.cfg
> object_cache_file=/usr/local/nagios/var/objects.cache
> resource_file=/usr/local/nagios/etc/resource.cfg
> status_file=/usr/local/nagios/var/status.dat
> nagios_user=nagios
> nagios_group=nagios
> command_check_interval=30s
> command_file=/usr/local/nagios/var/rw/nagios-test.cmd
> comment_file=/usr/local/nagios/var/comments.dat
> downtime_file=/usr/local/nagios/var/downtime.dat
> lock_file=/usr/local/nagios/var/nagios.lock
> temp_file=/usr/local/nagios/var/nagios.tmp
> log_rotation_method=d
> log_archive_path=/usr/local/nagios/var/archives
> use_syslog=0
> log_notifications=1
> log_service_retries=1
> log_host_retries=1
> log_event_handlers=1
> log_initial_states=1
> log_external_commands=1
> log_passive_checks=1
> service_inter_check_delay_method=s
> max_service_check_spread=60
> service_interleave_factor=s
> host_inter_check_delay_method=s
> max_host_check_spread=60
> max_concurrent_checks=20
> service_reaper_frequency=2
> service_check_timeout=30
> host_
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]