Page 1 of 1

Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 3:43 am
by matthewmelvin
I'm running nagios 4 CentOS 6 with passive results being submitted to the monitoring host via nsca. If I send a SIGHUP to the daemon to reload its configuration, it appears to do so successfully, but then dies on a SIGSEGV as soon as the first external result arrives. I've included some gdb back traces and log snippets to illustrate the issue. The behavior appears quite consistent. Any ideas would be most welcome. It is a pain to have to restart rather than reload nagios every time my configuration needs to be updated.

These core dumps where generated by nagios 4.0.0, but I have also tried 4.0.2 and it exhibited the same behavior. I've sanitized the server names and ip addresses, but these are otherwise untouched.

Code: Select all

Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259		movdqu	(%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1  0x000000000042b17c in find_host_by_name_or_address (name=0x1e4ea80 "clienthost1.example.com")
    at commands.c:307
#2  0x000000000042e890 in process_passive_service_check (check_time=1386832389, 
    host_name=0x1e4ea80 "clienthost1.example.com", svc_description=0x1a46c20 "ssl-sideA", 
    return_code=2, 
    output=0x1e4ea30 "CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;")
    at commands.c:2215
#3  0x000000000042e7a5 in cmd_process_service_check_result (cmd=30, 
    check_time=1386832389, 
    args=0x18488a0 "clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:2188
#4  0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386832389, 
    args=0x18488a0 "clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:1197
#5  0x000000000042ca6f in process_external_command1 (
    cmd=0x7f39c22e1010 "[1386832389] PROCESS_SERVICE_CHECK_RESULT;clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:884
#6  0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
    at commands.c:153
#7  0x000000000048ee19 in iobroker_poll (iobs=0x10ec0c0, timeout=50) at iobroker.c:364
#8  0x0000000000413bab in main (argc=3, argv=0x7fff04659bc8) at nagios.c:662
(gdb) 

[1386832389] Caught SIGHUP, restarting...
[1386832390] Event broker module 'NERD' deinitialized successfully.
[1386832390] livestatus: Socket thread has terminated
[1386832390] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386832390] Nagios 4.0.0 starting... (PID=2884)
[1386832390] Local time is Thu Dec 12 17:13:10 EST 2013
[1386832390] LOG VERSION: 2.0
[1386832390] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386832390] qh: core query handler registered
[1386832390] nerd: Channel hostchecks registered successfully
[1386832390] nerd: Channel servicechecks registered successfully
[1386832390] nerd: Channel opathchecks registered successfully
[1386832390] nerd: Fully initialized and ready to rock!
[1386832390] wproc: Successfully registered manager as @wproc with query handler
[1386832390] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;
[1386832390] Nagios 4.0.0 starting... (PID=2884)
[1386832390] Local time is Thu Dec 12 17:13:10 EST 2013
[1386832390] LOG VERSION: 2.0

Code: Select all

Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259		movdqu	(%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1  0x000000000042b17c in find_host_by_name_or_address (
    name=0x1999ac0 "clienthost2.example.com") at commands.c:307
#2  0x000000000042e890 in process_passive_service_check (check_time=1386832436, 
    host_name=0x1999ac0 "clienthost2.example.com", 
    svc_description=0x1b167b0 "hostname", return_code=0, 
    output=0x12e5040 "DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:2215
#3  0x000000000042e7a5 in cmd_process_service_check_result (cmd=30, 
    check_time=1386832436, 
    args=0x1b39da0 "clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:2188
#4  0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386832436, 
    args=0x1b39da0 "clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:1197
#5  0x000000000042ca6f in process_external_command1 (
    cmd=0x7f218d6be010 "[1386832436] PROCESS_SERVICE_CHECK_RESULT;clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.0"...) at commands.c:884
#6  0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
    at commands.c:153
#7  0x000000000048ee19 in iobroker_poll (iobs=0xdad0c0, timeout=50) at iobroker.c:364
#8  0x0000000000413bab in main (argc=3, argv=0x7fff51ecf5a8) at nagios.c:662
(gdb) 

[1386832436] Caught SIGHUP, restarting...
[1386832436] Event broker module 'NERD' deinitialized successfully.
[1386832436] livestatus: Socket thread has terminated
[1386832436] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386832436] Nagios 4.0.0 starting... (PID=12929)
[1386832436] Local time is Thu Dec 12 17:13:56 EST 2013
[1386832436] LOG VERSION: 2.0
[1386832436] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386832436] qh: core query handler registered
[1386832436] nerd: Channel hostchecks registered successfully
[1386832436] nerd: Channel servicechecks registered successfully
[1386832436] nerd: Channel opathchecks registered successfully
[1386832436] nerd: Fully initialized and ready to rock!
[1386832436] wproc: Successfully registered manager as @wproc with query handler
[1386832436] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000
[1386832447] Nagios 4.0.0 starting... (PID=22134)
[1386832447] Local time is Thu Dec 12 17:14:07 EST 2013
[1386832447] LOG VERSION: 2.0

Code: Select all

Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259		movdqu	(%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1  0x000000000042b17c in find_host_by_name_or_address (
    name=0x13f4f40 "clienthost3.example.com") at commands.c:307
#2  0x000000000042e890 in process_passive_service_check (check_time=1386833584, 
    host_name=0x13f4f40 "clienthost3.example.com", 
    svc_description=0x1d3fee0 "nsca-submit", return_code=0, 
    output=0x1d9f870 "OK - submission sent to all 2 nsca servers|") at commands.c:2215
#3  0x000000000042e7a5 in cmd_process_service_check_result (cmd=30, 
    check_time=1386833584, 
    args=0x1d8d4d0 "clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|") at commands.c:2188
#4  0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386833584, 
    args=0x1d8d4d0 "clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|") at commands.c:1197
#5  0x000000000042ca6f in process_external_command1 (
    cmd=0x7f8c7b1a7010 "[1386833584] PROCESS_SERVICE_CHECK_RESULT;clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|")
    at commands.c:884
#6  0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
    at commands.c:153
#7  0x000000000048ee19 in iobroker_poll (iobs=0x100f0c0, timeout=50) at iobroker.c:364
#8  0x0000000000413bab in main (argc=3, argv=0x7fff164119c8) at nagios.c:662
(gdb) 

[1386833584] Caught SIGHUP, restarting...
[1386833584] Event broker module 'NERD' deinitialized successfully.
[1386833585] livestatus: Socket thread has terminated
[1386833585] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386833585] Nagios 4.0.0 starting... (PID=22134)
[1386833585] Local time is Thu Dec 12 17:33:05 EST 2013
[1386833585] LOG VERSION: 2.0
[1386833585] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386833585] qh: core query handler registered
[1386833585] nerd: Channel hostchecks registered successfully
[1386833585] nerd: Channel servicechecks registered successfully
[1386833585] nerd: Channel opathchecks registered successfully
[1386833585] nerd: Fully initialized and ready to rock!
[1386833585] wproc: Successfully registered manager as @wproc with query handler
[1386833585] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|
[1386833600] Nagios 4.0.0 starting... (PID=12879)
[1386833600] Local time is Thu Dec 12 17:33:20 EST 2013
[1386833600] LOG VERSION: 2.0

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 11:47 am
by abrist
What version of nsca are you running on the remote host and the nagios server?

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 1:37 pm
by matthewmelvin
abrist wrote:What version of nsca are you running on the remote host and the nagios server?
nsca 2.7.2 on both ends.

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 3:03 pm
by abrist
Do get the segfault when you restart nagios, or just reload?

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 5:01 pm
by matthewmelvin
abrist wrote:Do get the segfault when you restart nagios, or just reload?
Only when reloading.

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 5:12 pm
by abrist
I could have sworn this bug was already reported, but I cannot find it on the tracker. Please report this bug to http://tracker.nagios.org Post a link to the bug in this forum afterwards. Thank you kindly.

Re: Segmentation fault when reloading nagios 4

Posted: Thu Dec 12, 2013 6:59 pm
by matthewmelvin
Existing bug http://tracker.nagios.org/view.php?id=548 appears to be same issue. I've appended my details to that.

Re: Segmentation fault when reloading nagios 4

Posted: Fri Dec 13, 2013 10:12 am
by abrist
You found it! I know our core devs are aware of this. You can keep updated of the progress on tracker.

Re: Segmentation fault when reloading nagios 4

Posted: Fri Dec 13, 2013 10:12 am
by slansing
Great, thank you, updates will be made through that tracker.