Re: [Nagios-devel] Memory leak in Nagios head
Posted: Tue Nov 30, 2004 4:45 am
This is a multi-part message in MIME format.
--------------030102010007060705000709
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
The "repeated SIGHUP" crash occurs in
find_host(temp_hostextinfo->host_name), called from pre_flight_check().
The attached patch makes nagios at least survive the HUPs (even though
the memory leak is still there, so it should crash eventually when it
hits the memory limit). I haven't tested wether this affects the GUI or not.
Note that it's only tested using Matthews patch as well (which din't fix
the problem), so I don't know if it will work solo or if both of them
have to be combined to do the trick.
Andreas Ericsson wrote:
> Matthew Kent wrote:
>
>> On Mon, 2004-11-29 at 15:34, Andreas Ericsson wrote:
>>
>>> Matthew Kent wrote:
>>>
>>>> Forwarding this on in case anyone else has seen this behaviour and has
>>>> some suggestions. I'll give it a run through valgrind and see if I can
>>>> spot anything this evening.
>>>>
>>>
>>> Thanks, Matt.
>>>
>>> A small update;
>>>
>>> After having run the daemon about 10 hours at a test system, memory
>>> consumption has escalated from roughly 1MB to around 24MB. Not very
>>> nice figures. It seems that sending a HUP makes memory consumption
>>> make a small jump (usually around 20K).
>>
>>
>>
>> Well I may have trapped the HUP problem after some passes through
>> valgrind. Seems reset_variables was getting called twice, right after
>> receiving a sighup and immediately after at the start of the main do()
>> loop in nagios.c
>
>
> I'll get to testing right away.
>
>> I've removed the call to it from cleanup() as it's only called when
>> erroring out anyway, and resetting the variables at this point is a bit
>> of a lost cause
>>
>> I also fixed a couple other minor items reported by valgrind. Although I
>> couldn't figure out this last one
>>
>> 64 bytes in 8 blocks are definitely lost in loss record 66 of 118
>> at 0x1B904EDD: malloc (vg_replace_malloc.c:131)
>> by 0x808F4D4: xodtemplate_add_host_to_hostlist (xodtemplate.c:10665)
>> by 0x808F456: xodtemplate_add_hostgroup_members_to_hostlist
>> (xodtemplate.c:10640)
>> by 0x808EF0E: xodtemplate_expand_hostgroups (xodtemplate.c:10434)
>>
>
> This shouldn't be the longstanding problem though, since NSCORE doesn't
> use xodtemplate_expand_hostgroups() on a regular basis. I'm leaning
> towards a very small and subtle in-struct leak in base/checks.c or
> common/statusdata.c (and their underlying functions, naturally).
> Particularly since the problem seems to present itself more rapidly when
> hosts and services changes status a lot (or possibly just change their
> plugin output).
>
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
--------------030102010007060705000709
Content-Type: text/plain;
name="nagios-cvs-HUP-crash.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nagios-cvs-HUP-crash.diff"
--- ../nagios.orig/common/objects.c Mon Nov 15 13:47:21 2004
+++ common/objects.c Tue Nov 30 13:32:58 2004
@@ -6051,13 +6051,21 @@
for(this_hostextinfo=hostextinfo_list;this_hostextinfo!=NULL;this_hostextinfo=next_hostextinfo){
next_hostextinfo=this_hostextinfo->next;
free(this_hostextinfo->host_name);
+ this_hostextinfo->host_name=NULL;
free(this_hostextinfo->notes);
+ this_hostextinfo->notes=NULL;
free(this_hostextinfo->notes_url);
+ this_hostextinfo->notes_url=NULL;
free(this_hostextinfo->action_url);
+ this_hostextinfo->action_url=NULL;
free(this_hostextinfo->icon_image);
+ this_hostextinfo->icon_image=NULL;
free(this_hostextinfo->vrml_image);
+ this_hostextinfo->vrml_image=NULL;
free(this_hostextinfo->statusmap_image);
+ this_hostextinfo->statusmap_image=NULL;
free(this_hostextinfo->icon_image_alt);
+ this_hostextinfo->icon_image_alt=NULL;
free(this_hostextinfo);
}
@@ -6074,11 +6082,19 @@
for(this_serviceextinfo=serviceextinfo_list;this_serviceextinfo!=NULL;this_serviceextinfo=next_serviceextinfo){
next_serviceextinfo=this_serviceextinfo-
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
--------------030102010007060705000709
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
The "repeated SIGHUP" crash occurs in
find_host(temp_hostextinfo->host_name), called from pre_flight_check().
The attached patch makes nagios at least survive the HUPs (even though
the memory leak is still there, so it should crash eventually when it
hits the memory limit). I haven't tested wether this affects the GUI or not.
Note that it's only tested using Matthews patch as well (which din't fix
the problem), so I don't know if it will work solo or if both of them
have to be combined to do the trick.
Andreas Ericsson wrote:
> Matthew Kent wrote:
>
>> On Mon, 2004-11-29 at 15:34, Andreas Ericsson wrote:
>>
>>> Matthew Kent wrote:
>>>
>>>> Forwarding this on in case anyone else has seen this behaviour and has
>>>> some suggestions. I'll give it a run through valgrind and see if I can
>>>> spot anything this evening.
>>>>
>>>
>>> Thanks, Matt.
>>>
>>> A small update;
>>>
>>> After having run the daemon about 10 hours at a test system, memory
>>> consumption has escalated from roughly 1MB to around 24MB. Not very
>>> nice figures. It seems that sending a HUP makes memory consumption
>>> make a small jump (usually around 20K).
>>
>>
>>
>> Well I may have trapped the HUP problem after some passes through
>> valgrind. Seems reset_variables was getting called twice, right after
>> receiving a sighup and immediately after at the start of the main do()
>> loop in nagios.c
>
>
> I'll get to testing right away.
>
>> I've removed the call to it from cleanup() as it's only called when
>> erroring out anyway, and resetting the variables at this point is a bit
>> of a lost cause
>>
>> I also fixed a couple other minor items reported by valgrind. Although I
>> couldn't figure out this last one
>>
>> 64 bytes in 8 blocks are definitely lost in loss record 66 of 118
>> at 0x1B904EDD: malloc (vg_replace_malloc.c:131)
>> by 0x808F4D4: xodtemplate_add_host_to_hostlist (xodtemplate.c:10665)
>> by 0x808F456: xodtemplate_add_hostgroup_members_to_hostlist
>> (xodtemplate.c:10640)
>> by 0x808EF0E: xodtemplate_expand_hostgroups (xodtemplate.c:10434)
>>
>
> This shouldn't be the longstanding problem though, since NSCORE doesn't
> use xodtemplate_expand_hostgroups() on a regular basis. I'm leaning
> towards a very small and subtle in-struct leak in base/checks.c or
> common/statusdata.c (and their underlying functions, naturally).
> Particularly since the problem seems to present itself more rapidly when
> hosts and services changes status a lot (or possibly just change their
> plugin output).
>
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Lead Developer
--------------030102010007060705000709
Content-Type: text/plain;
name="nagios-cvs-HUP-crash.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nagios-cvs-HUP-crash.diff"
--- ../nagios.orig/common/objects.c Mon Nov 15 13:47:21 2004
+++ common/objects.c Tue Nov 30 13:32:58 2004
@@ -6051,13 +6051,21 @@
for(this_hostextinfo=hostextinfo_list;this_hostextinfo!=NULL;this_hostextinfo=next_hostextinfo){
next_hostextinfo=this_hostextinfo->next;
free(this_hostextinfo->host_name);
+ this_hostextinfo->host_name=NULL;
free(this_hostextinfo->notes);
+ this_hostextinfo->notes=NULL;
free(this_hostextinfo->notes_url);
+ this_hostextinfo->notes_url=NULL;
free(this_hostextinfo->action_url);
+ this_hostextinfo->action_url=NULL;
free(this_hostextinfo->icon_image);
+ this_hostextinfo->icon_image=NULL;
free(this_hostextinfo->vrml_image);
+ this_hostextinfo->vrml_image=NULL;
free(this_hostextinfo->statusmap_image);
+ this_hostextinfo->statusmap_image=NULL;
free(this_hostextinfo->icon_image_alt);
+ this_hostextinfo->icon_image_alt=NULL;
free(this_hostextinfo);
}
@@ -6074,11 +6082,19 @@
for(this_serviceextinfo=serviceextinfo_list;this_serviceextinfo!=NULL;this_serviceextinfo=next_serviceextinfo){
next_serviceextinfo=this_serviceextinfo-
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]