Re: [Nagios-devel] bug: unlocking an invalid mutex

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] bug: unlocking an invalid mutex

Post by Guest »

Geert Hendrickx wrote:
> Hi,
>
> I tried to upgrade a Nagios 2.5 system running on NetBSD to Nagios 2.9.
> But it seems like a mutex bug has been introduced in Nagios 2.7 (I can
> reproduce it with Nagios 2.7 but not with 2.5 and 2.6).
>
> Unlike Linux, NetBSD's pthread implementation is quite unforgiving for
> mutex errors, and aborts a running program e.g. when it tries to unlock
> an invalid mutex. This is what is happening with Nagios:
>
>> Nagios 2.9 starting... (PID=17620)
>> nagios: Error detected by libpthread: Invalid mutex.
>> Detected by file "/cvs/src/3/lib/libpthread/pthread_mutex.c", line 334, function "pthread_mutex_unlock".
>> See pthread(3) for information.
>>
>> Program received signal SIGABRT, Aborted.
>> [Switching to LWP 1]
>> 0xbd9e921f in kill () from /usr/lib/libc.so.12
>> (gdb) bt
>> #0 0xbd9e921f in kill () from /usr/lib/libc.so.12
>> #1 0xbdaa6fb6 in pthread__errorfunc () from /usr/lib/libpthread.so.0
>> #2 0xbdaa3d4b in pthread_mutex_unlock () from /usr/lib/libpthread.so.0
>> #3 0x080a1651 in xsddefault_save_status_data () at ../xdata/xsddefault.c:338
>> #4 0x080a10bd in update_all_status_data () at ../common/statusdata.c:93
>> #5 0x080544dc in main (argc=2, argv=0xbfbfe8b8, env=0xbfbfe8c4) at nagios.c:665
>> #6 0x0805377d in ___start ()
>> (gdb)
>
> The problem is probably in this change between Nagios 2.6 and 2.7:
>
> --- xdata/xsddefault.c 2006-05-20 21:39:34.000000000 +0200
> +++ xdata/xsddefault.c 2007-01-03 03:50:43.000000000 +0100
> @@ -322,6 +331,18 @@
> return ERROR;
> }
>
> + /* get number of items in the check result buffer */
> + pthread_mutex_lock(&service_result_buffer.buffer_lock);
> + used_check_result_buffer_slots=service_result_buffer.items;
> + high_check_result_buffer_slots=service_result_buffer.high;
> + pthread_mutex_unlock(&service_result_buffer.buffer_lock);
> +
> + /* get number of items in the command buffer */
> + pthread_mutex_lock(&external_command_buffer.buffer_lock);
> + used_external_command_buffer_slots=external_command_buffer.items;
> + high_external_command_buffer_slots=external_command_buffer.high;
> + pthread_mutex_unlock(&external_command_buffer.buffer_lock);
> +
> /* write version info to status file */
> fprintf(fp,"########################################\n");
> fprintf(fp,"# NAGIOS STATUS FILE\n");
>
>
> Can this please be looked into? Do I need to provide more information?
>
> Thanks,
>
> Geert
>
>
> PS: please keep me Cc'd.
>

Did you by chance have external commands disabled when you got the SIGABRT?

I believe the problem was due to the external_command_buffer.buffer_lock
mutex being accessed even in external commands were disabled (in which
case the mutex wouldn't exist).

A fix has been committed to CVS (both the 2.x and HEAD branches). When
you get a chance, test the new 2.x CVS code and see if it solves the
problem.


Ethan Galstad,
Nagios Developer
---
Email: [email protected]
Website: http://www.nagios.org





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked