Re: [Nagios-devel] nagios 3.2.3 localtime deadlock

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] nagios 3.2.3 localtime deadlock

Post by Guest »

On 10/08/2010 07:44 AM, Thomas Guyot-Sionnest wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10-10-07 02:43 PM, Matthew Kent wrote:
>> Hello all,
>>
>> Setting up a new nagios 3.2.3 install and occasionally (once in 24
>> hours) I'm seeing a child deadlock when calling localtime() like so:
>>
>> (gdb) bt
>> #0 0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
>> #1 0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
>> #2 0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
>> #3 0x000000000043e23e in get_datetime_string (raw_time=> optimized out>, buffer=0x2aaab014feb0,
>> buffer_length=48, type=0) at utils.c:1696
>> #4 0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
>> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
>> #5 0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
>> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
>> #6 0x0000000000433586 in set_macrox_environment_vars (set=1) at
>> ../common/macros.c:3166
>> #7 0x00000000004335bb in set_all_macro_environment_vars (set=1) at
>> ../common/macros.c:3134
>> #8 0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
>> check_options=, latency=,
>> scheduled_check=1, reschedule_check=1,
>> time_is_valid=, preferred_time=> optimized out>) at checks.c:658
>> #9 0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
>> check_options=0, latency=0.68999999999999995) at checks.c:260
>> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
>> events.c:1257
>> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
>> #12 0x0000000000413055 in main (argc=,
>> argv=, env=0x7fffa0670758) at nagios.c:850
>>
>> this leads to Nagios being completely frozen until I manually kill the child.
>>
>> Some light Googling tells me this can happen with localtime in certain
>> cases, but I see no indication of other people with this issue in
>> Nagios.
>>
>> It's a pretty standard Nagios install on CentOS 5.5 - except for the
>> fact I'm using the mk-livestatus event broker. We have a couple
>> thousand checks configured on a pretty aggressive interval.
>>
>> Anyone seen this before?
>
> I'm far from being expert in threading and locking, but afaik
> localtime(), located at utils.c:1696, like other similar time functions,
> is not thread safe. I'm wondering it using the _r versions would help...
>

It's not threadsafe because two concurrent calls will modify the same
struct, so if two threads attempt to get a time representation á la
struct tm, they may overwrite each others data and get the time fscked
up. In Nagios, that's not a huge issue and really shouldn't matter for
scheduling since we use timestamps for those, and getting the current
timestamp is an atomic (and thus implicitly threadsafe) operation.

> At first glance it seems we might have quite some code to change in
> order to be 100% thread-safe:
>

We only need to change the places where we don't get "now" as a struct
tm from either of the functions, since it doesn't matter if they start
overwriting each other.

> $ grep -RE '(asctime|ctime|gmtime|localtime)[[:space:]]*\(' base/|wc -l
> 77
>
> Although not all invocations are necessarily in threaded code. Anyone
> more experienced could confirm if this is the actual issue?
>

It would be far better to remove threading from Nagios altogether and
use worker daemons to perform the actual checks, but that's a much
larger change.

--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked