Page 1 of 1

[Nagios-devel] Multi-Threaded Nagios keeps on truckin?

Posted: Wed Sep 09, 2009 8:00 pm
by Guest
This is odd.
I started Nagios with my threading patched under GDB, ran it for an hour or=
so, eventually I went to lunch and came back.
Looking at the log and the screen I noticed everything looked normal, but I=
found this at about 2:30 of runtime

[New Thread 1871707040 (LWP 27875)]
*** glibc detected *** /usr/local/nagios/bin/nagios: double free or corrupt=
ion (fasttop): 0xb3b17710 ***
=3D=3D=3D=3D=3D=3D=3D Backtrace: =3D=3D=3D=3D=3D=3D=3D=3D=3D
/lib/libc.so.6[0xb7eb2db2]
/lib/libc.so.6(__libc_free+0x84)[0xb7eb4414]
/usr/local/nagios/bin/nagios(free_memory+0x1e3)[0x807af33]
/usr/local/nagios/bin/nagios(my_system+0x223)[0x80773af]
/usr/local/nagios/bin/nagios(run_host_check+0x2ea)[0x8059966]
/usr/local/nagios/bin/nagios(check_host+0x2db)[0x8058fa4]
/usr/local/nagios/bin/nagios(verify_route_to_host+0x2b)[0x8058904]
/usr/local/nagios/bin/nagios(reap_service_checks+0xac6)[0x8057884]
/lib/libpthread.so.0[0xb7f7b13b]
/lib/libc.so.6(__clone+0x5e)[0xb7f0cfbe]


I would have assumed that the application would have stopped at this point =
but, it appears to have just shaken it off and continued....
In fact it's still going, it continued as per normal for what i estimate to=
be over an hour more, spawning 3, 4 even 5 threads to handle the service r=
eaper, but finally it appears that the check results buffer is not being fi=
lled by anything anymore because I see reaper threads being spawned and ins=
tantly exiting.
Stepping through the reaper process shows that the buffer is empty every ti=
me.
Since this is a DNX based setup we are talking about, it would appear that =
the DNX Collector has gone deaf, but I think it may be something else.
It's possible that the segfault occurred while holding the mutex for the re=
sults buffer, thereby preventing the DNX collector from writing to it, howe=
ver upon examination the mutex appears to be unlocked (__lock =3D 0)
I'm going to keep looking, but in the meantime if anyone has any other idea=
s on what I might want to check here, it would be very much appreciated.

I find it very puzzling that a segfault 30 or 40 minutes ago would only now=
cause the results buffer to remain empty.=20
It makes me wonder if we even have a real correlation here or if something =
else is at play.

Sincerely,
Steve

p.s. The solution to the segfault issue itself was be to make the free_mem=
ory function thread safe, which I have now done.


NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]