Re: [Nagios-devel] Nagios stop hangs in FUTEX_WAIT

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Nagios stop hangs in FUTEX_WAIT

Post by Guest »

Ethan Galstad wrote:
> Herbert Straub wrote:
>> If i try to stop nagios with /etc/init.d/nagios stop on Fedora Core 4/6
>> with Nagios 2.4 and 2.7 the message:|
>>
>> Warning - running nagios did not exit in time|
>> ||
>> |The nagios process hangs in futex wait - example:|
>> ||
>> ||
>> root@xen1 ~]# strace -p 11620
>> Process 11620 attached - interrupt to quit
>> futex(0x2aaaabf15980, FUTEX_WAIT, 2, NULL
>>
>> This happens not every stop, but 60% of the stop tries. I build nagios
>> with debugging info and attach to the hanging process with gdb and see
>> three threads with the following stack trace:
>>
>> thread 1:
>>
>> #0 0x0000003663ad9298 in __lll_mutex_lock_wait () from /lib64/libc.so.6
>> #1 0x0000003663a730e8 in _L_lock_14830 () from /lib64/libc.so.6
>> #2 0x0000003663a723ab in realloc () from /lib64/libc.so.6
>> #3 0x0000003663a66224 in _IO_mem_finish () from /lib64/libc.so.6
>> #4 0x0000003663a5e2ef in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
>> #5 0x0000003663ac9bf1 in __vsyslog_chk () from /lib64/libc.so.6
>> #6 0x0000003663aca120 in syslog () from /lib64/libc.so.6
>> #7 0x0000000000424227 in write_to_syslog (buffer=0x7fffa9aaaeb0 "Caught SIGTERM, shutting down...\n", data_type=64) at logging.c:229
>> #8 0x00000000004248c9 in write_to_all_logs (buffer=0x7fffa9aaaeb0 "Caught SIGTERM, shutting down...\n", data_type=64) at logging.c:123
>> #9 0x000000000042b09e in sighandler (sig=) at utils.c:3410
>> #10
>> #11 0x0000003663a94809 in fork () from /lib64/libc.so.6
>> #12 0x000000000042f8b2 in my_system (cmd=0x7fffa9aac6b0 "/usr/local/share/nagios2/eventhandlers/process_perfdata.pl", timeout=5, early_timeout=0x7fffa9aacebc, exectime=0x7fffa9aaceb0, output=0x0, output_length=0) at utils.c:2699
>> #13 0x00000000004536a3 in xpddefault_run_service_performance_data_command (svc=0x14672c0) at ../xdata/xpddefault.c:469
>> #14 0x0000000000453729 in xpddefault_update_service_performance_data (svc=0x1200011) at ../xdata/xpddefault.c:400
>> #15 0x0000000000453305 in update_service_performance_data (svc=0x1200011) at perfdata.c:91
>> #16 0x0000000000413855 in reap_service_checks () at checks.c:1396
>> #17 0x0000000000421ad2 in handle_timed_event (event=0x778c30) at events.c:1254
>> #18 0x0000000000421e73 in event_execution_loop () at events.c:965
>> #19 0x000000000040efa7 in main (argc=, argv=, env=0x7fffa9aae280) at nagios.c:710
>>
>>
>> |thread 2:
>> |
>>
>> #0 0x0000003663ac4a36 in poll () from /lib64/libc.so.6
>> #1 0x0000000000429ace in service_result_worker_thread (arg=) at utils.c:4775
>> #2 0x0000003664606305 in start_thread () from /lib64/libpthread.so.0
>> #3 0x0000003663acd50d in clone () from /lib64/libc.so.6
>>
>> thread 3:
>> #0 0x0000003663ac6ac2 in select () from /lib64/libc.so.6
>> #1 0x000000000042996e in command_file_worker_thread (arg=) at utils.c:4943
>> #2 0x0000003664606305 in start_thread () from /lib64/libpthread.so.0
>> #3 0x0000003663acd50d in clone () from /lib64/libc.so.6
>>
>> Source part of thread 1:
>> else if(sig>
>> sigshutdown=TRUE;
>>
>> sprintf(temp_buffer,"Caught SIG%s, shutting down...\n",sigs[sig]);
>> ---> write_to_all_logs(temp_buffer,NSLOG_PROCESS_INFO);
>>
>> Source part of thread 2:
>> while(1){
>>
>> /* should we shutdown? */
>> pthread_testcancel();
>>
>> /* wait for data to arrive */
>> /* select seems to not work, so we have to use poll instead */
>> pfd.fd=ipc_pipe[0];
>> pfd.events=POLLIN;
>> ---> pollval=poll(&pfd,1,500);
>>
>> Source part of thread 3:
>> while(1){
>>
>> /* should we shutdown? */
>> pthread_testcancel();
>>
>> /**** POLL() AND SELECT() DON'T SEEM TO WORK ****/
>>

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked