Re: [Nagios-devel] nagios 3.2.3 localtime deadlock

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] nagios 3.2.3 localtime deadlock

Post by Guest »

On 10/12/2010 07:44 PM, Matthew Kent wrote:
> On Fri, Oct 8, 2010 at 9:08 AM, Matthew Kent wrote:
>> On Fri, Oct 8, 2010 at 3:52 AM, Andreas Ericsson wrote:
>>> On 10/07/2010 08:43 PM, Matthew Kent wrote:
>>>> Hello all,
>>>>
>>>
>>> Hey you. First of all, thanks for including a backtrace. That's really
>>> neat.
>>>
>>
>> Thanks for looking :)
>>
>>>> Setting up a new nagios 3.2.3 install and occasionally (once in 24
>>>> hours) I'm seeing a child deadlock when calling localtime() like so:
>>>>
>>>> (gdb) bt
>>>> #0 0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
>>>> #1 0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
>>>> #2 0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
>>>> #3 0x000000000043e23e in get_datetime_string (raw_time=>>> optimized out>, buffer=0x2aaab014feb0,
>>>> buffer_length=48, type=0) at utils.c:1696
>>>> #4 0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
>>>> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
>>>> #5 0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
>>>> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
>>>> #6 0x0000000000433586 in set_macrox_environment_vars (set=1) at
>>>> ../common/macros.c:3166
>>>> #7 0x00000000004335bb in set_all_macro_environment_vars (set=1) at
>>>> ../common/macros.c:3134
>>>> #8 0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
>>>> check_options=, latency=,
>>>> scheduled_check=1, reschedule_check=1,
>>>> time_is_valid=, preferred_time=>>> optimized out>) at checks.c:658
>>>> #9 0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
>>>> check_options=0, latency=0.68999999999999995) at checks.c:260
>>>> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
>>>> events.c:1257
>>>> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
>>>> #12 0x0000000000413055 in main (argc=,
>>>> argv=, env=0x7fffa0670758) at nagios.c:850
>>>>
>>>> this leads to Nagios being completely frozen until I manually kill the child.
>>>>
>>>
>>>
>>> Looking at the glibc code, I see no possible way that a single thread
>>> can hold on to the lock in __tz_convert() for any extended period of
>>> time. What version of glibc are you using?
>>>
>>
>> glibc-2.5-49.el5_5.4.x86_64
>>
>>>> Some light Googling tells me this can happen with localtime in certain
>>>> cases, but I see no indication of other people with this issue in
>>>> Nagios.
>>>>
>>>
>>> Since this seems to happen in the codepath that exports macros as
>>> environment variables, I'd like to know if it happens if you turn
>>> that stuff off. Unless you really, really need it it's a good idea
>>> to do that anyways, since computing a bazillion macros each time
>>> Nagios runs a check is quite expensive. Set
>>>
>>> use_large_installation_tweaks=1
>>> or
>>> enable_environment_macros=0
>>>
>>> in your nagios.cfg file.
>>>
>>> use_large_installation_tweaks=1 is a really good idea anyways unless
>>> you're running Nagios on Windows 95, where a process' used memory
>>> was never reclaimed by the system unless manually free()'d.
>>>
>>
>> Yeah we don't even use the environment variables. Thanks for all the info.
>>
>>>> It's a pretty standard Nagios install on CentOS 5.5 - except for the
>>>> fact I'm using the mk-livestatus event broker. We have a couple
>>>> thousand checks configured on a pretty aggressive interval.
>>>>
>>>
>>> First try disabling environment macros. Then try without the
>>> mk-livestatus module. Seeing it happen in a pristine Nagios would mean
>>> we don't need to speculate about where the problem happens.
>>
>> Good call, I'll disable the env macros and run it over the weekend,
>> then reenable them and with livestatus off for good measure and report
>> back here. We'll see what happens!
>>
>
> Oops, this was originally supposed to go to the list.
>
>

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked