[Nagios-devel] Multi-Threaded Nagios keeps on truckin?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Multi-Threaded Nagios keeps on truckin?

Post by Guest »

This is odd.
I started Nagios with my threading patched under GDB, ran it for an hour or=
so, eventually I went to lunch and came back.
Looking at the log and the screen I noticed everything looked normal, but I=
found this at about 2:30 of runtime

[New Thread 1871707040 (LWP 27875)]
*** glibc detected *** /usr/local/nagios/bin/nagios: double free or corrupt=
ion (fasttop): 0xb3b17710 ***
=3D=3D=3D=3D=3D=3D=3D Backtrace: =3D=3D=3D=3D=3D=3D=3D=3D=3D
/lib/libc.so.6[0xb7eb2db2]
/lib/libc.so.6(__libc_free+0x84)[0xb7eb4414]
/usr/local/nagios/bin/nagios(free_memory+0x1e3)[0x807af33]
/usr/local/nagios/bin/nagios(my_system+0x223)[0x80773af]
/usr/local/nagios/bin/nagios(run_host_check+0x2ea)[0x8059966]
/usr/local/nagios/bin/nagios(check_host+0x2db)[0x8058fa4]
/usr/local/nagios/bin/nagios(verify_route_to_host+0x2b)[0x8058904]
/usr/local/nagios/bin/nagios(reap_service_checks+0xac6)[0x8057884]
/lib/libpthread.so.0[0xb7f7b13b]
/lib/libc.so.6(__clone+0x5e)[0xb7f0cfbe]


I would have assumed that the application would have stopped at this point =
but, it appears to have just shaken it off and continued....
In fact it's still going, it continued as per normal for what i estimate to=
be over an hour more, spawning 3, 4 even 5 threads to handle the service r=
eaper, but finally it appears that the check results buffer is not being fi=
lled by anything anymore because I see reaper threads being spawned and ins=
tantly exiting.
Stepping through the reaper process shows that the buffer is empty every ti=
me.
Since this is a DNX based setup we are talking about, it would appear that =
the DNX Collector has gone deaf, but I think it may be something else.
It's possible that the segfault occurred while holding the mutex for the re=
sults buffer, thereby preventing the DNX collector from writing to it, howe=
ver upon examination the mutex appears to be unlocked (__lock =3D 0)
I'm going to keep looking, but in the meantime if anyone has any other idea=
s on what I might want to check here, it would be very much appreciated.

I find it very puzzling that a segfault 30 or 40 minutes ago would only now=
cause the results buffer to remain empty.=20
It makes me wonder if we even have a real correlation here or if something =
else is at play.

Sincerely,
Steve

p.s. The solution to the segfault issue itself was be to make the free_mem=
ory function thread safe, which I have now done.


NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.







This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked