> On Fri, Oct 8, 2010 at 9:08 AM, Matthew Kent wrote:
>> On Fri, Oct 8, 2010 at 3:52 AM, Andreas Ericsson wrote:
>>> On 10/07/2010 08:43 PM, Matthew Kent wrote:
>>>> Hello all,
>>>>
>>>
>>> Hey you. First of all, thanks for including a backtrace. That's really
>>> neat.
>>>
>>
>> Thanks for looking
>>
>>>> Setting up a new nagios 3.2.3 install and occasionally (once in 24
>>>> hours) I'm seeing a child deadlock when calling localtime() like so:
>>>>
>>>> (gdb) bt
>>>> #0 0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
>>>> #1 0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
>>>> #2 0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
>>>> #3 0x000000000043e23e in get_datetime_string (raw_time=>>> optimized out>, buffer=0x2aaab014feb0,
>>>> buffer_length=48, type=0) at utils.c:1696
>>>> #4 0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
>>>> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
>>>> #5 0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
>>>> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
>>>> #6 0x0000000000433586 in set_macrox_environment_vars (set=1) at
>>>> ../common/macros.c:3166
>>>> #7 0x00000000004335bb in set_all_macro_environment_vars (set=1) at
>>>> ../common/macros.c:3134
>>>> #8 0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
>>>> check_options=, latency=,
>>>> scheduled_check=1, reschedule_check=1,
>>>> time_is_valid=, preferred_time=>>> optimized out>) at checks.c:658
>>>> #9 0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
>>>> check_options=0, latency=0.68999999999999995) at checks.c:260
>>>> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
>>>> events.c:1257
>>>> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
>>>> #12 0x0000000000413055 in main (argc=,
>>>> argv=, env=0x7fffa0670758) at nagios.c:850
>>>>
>>>> this leads to Nagios being completely frozen until I manually kill the child.
>>>>
>>>
>>>
>>> Looking at the glibc code, I see no possible way that a single thread
>>> can hold on to the lock in __tz_convert() for any extended period of
>>> time. What version of glibc are you using?
>>>
>>
>> glibc-2.5-49.el5_5.4.x86_64
>>
>>>> Some light Googling tells me this can happen with localtime in certain
>>>> cases, but I see no indication of other people with this issue in
>>>> Nagios.
>>>>
>>>
>>> Since this seems to happen in the codepath that exports macros as
>>> environment variables, I'd like to know if it happens if you turn
>>> that stuff off. Unless you really, really need it it's a good idea
>>> to do that anyways, since computing a bazillion macros each time
>>> Nagios runs a check is quite expensive. Set
>>>
>>> use_large_installation_tweaks=1
>>> or
>>> enable_environment_macros=0
>>>
>>> in your nagios.cfg file.
>>>
>>> use_large_installation_tweaks=1 is a really good idea anyways unless
>>> you're running Nagios on Windows 95, where a process' used memory
>>> was never reclaimed by the system unless manually free()'d.
>>>
>>
>> Yeah we don't even use the environment variables. Thanks for all the info.
>>
>>>> It's a pretty standard Nagios install on CentOS 5.5 - except for the
>>>> fact I'm using the mk-livestatus event broker. We have a couple
>>>> thousand checks configured on a pretty aggressive interval.
>>>>
>>>
>>> First try disabling environment macros. Then try without the
>>> mk-livestatus module. Seeing it happen in a pristine Nagios would mean
>>> we don't need to speculate about where the problem happens.
>>
>> Good call, I'll disable the env macros and run it over the weekend,
>> then reenable them and with livestatus off for good measure and report
>> back here. We'll see what happens!
>>
>
> Oops, this was originally supposed to go to the list.
>
>
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]