Re: [Nagios-devel] bug: unlocking an invalid mutex

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] bug: unlocking an invalid mutex

Post by Guest »

Ethan Galstad wrote:
> Andreas Ericsson wrote:
>> Geert Hendrickx wrote:
>>> Hi,
>>>
>>> I tried to upgrade a Nagios 2.5 system running on NetBSD to Nagios 2.9.
>>> But it seems like a mutex bug has been introduced in Nagios 2.7 (I can
>>> reproduce it with Nagios 2.7 but not with 2.5 and 2.6).
>>>
>>> Unlike Linux, NetBSD's pthread implementation is quite unforgiving for
>>> mutex errors, and aborts a running program e.g. when it tries to unlock
>>> an invalid mutex. This is what is happening with Nagios:
>>>
>>>> Nagios 2.9 starting... (PID=17620)
>>>> nagios: Error detected by libpthread: Invalid mutex.
>>>> Detected by file "/cvs/src/3/lib/libpthread/pthread_mutex.c", line 334, function "pthread_mutex_unlock".
>>>> See pthread(3) for information.
>>>>
>>>> Program received signal SIGABRT, Aborted.
>>>> [Switching to LWP 1]
>>>> 0xbd9e921f in kill () from /usr/lib/libc.so.12
>>>> (gdb) bt
>>>> #0 0xbd9e921f in kill () from /usr/lib/libc.so.12
>>>> #1 0xbdaa6fb6 in pthread__errorfunc () from /usr/lib/libpthread.so.0
>>>> #2 0xbdaa3d4b in pthread_mutex_unlock () from /usr/lib/libpthread.so.0
>>>> #3 0x080a1651 in xsddefault_save_status_data () at ../xdata/xsddefault.c:338
>>>> #4 0x080a10bd in update_all_status_data () at ../common/statusdata.c:93
>>>> #5 0x080544dc in main (argc=2, argv=0xbfbfe8b8, env=0xbfbfe8c4) at nagios.c:665
>>>> #6 0x0805377d in ___start ()
>>>> (gdb)
>>> The problem is probably in this change between Nagios 2.6 and 2.7:
>>>
>>> --- xdata/xsddefault.c 2006-05-20 21:39:34.000000000 +0200
>>> +++ xdata/xsddefault.c 2007-01-03 03:50:43.000000000 +0100
>>> @@ -322,6 +331,18 @@
>>> return ERROR;
>>> }
>>>
>>> + /* get number of items in the check result buffer */
>>> + pthread_mutex_lock(&service_result_buffer.buffer_lock);
>>> + used_check_result_buffer_slots=service_result_buffer.items;
>>> + high_check_result_buffer_slots=service_result_buffer.high;
>>> + pthread_mutex_unlock(&service_result_buffer.buffer_lock);
>>> +
>>> + /* get number of items in the command buffer */
>>> + pthread_mutex_lock(&external_command_buffer.buffer_lock);
>>> + used_external_command_buffer_slots=external_command_buffer.items;
>>> + high_external_command_buffer_slots=external_command_buffer.high;
>>> + pthread_mutex_unlock(&external_command_buffer.buffer_lock);
>>> +
>>> /* write version info to status file */
>>> fprintf(fp,"########################################\n");
>>> fprintf(fp,"# NAGIOS STATUS FILE\n");
>>>
>>>
>>> Can this please be looked into? Do I need to provide more information?
>>>
>> I suppose just checking for success from the pthread_mutex_lock() calls would
>> be enough, and letting it spinlock for 10 tries if it fails. If it *always* fails,
>> that would be quite horrible though, as it would mean something fairly illegal is
>> going on in there.
>>
>> I'll whip up a patch for it once I'm done with what I'm currently fiddling with.
>>
>
> It looks like the error is occurring in the pthread_mutex_unlock()
> function, which is strange. Checking Google resulted in a couple of
> hits that make it sound like a problem in NetBSD's pthread implementation.
>

I'm not so sure. pthread_mutex_lock() can actually fail, but it's not defined
what happens with the lock when that happens, or how the mutex is used within
the threading library.

> Does the error still occur if you set "PTHREAD_DIAGASSERT='A'" before
> starting Nagios up? Here's one article that describes how doing so
> fixed a similar error with gftp under NetBSD:
>

Worth giving a shot, I guess. If nothing else, it's easier than worrying about
ending up in performance-eating spinlock on the mutex.

--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked