Segmentation fault when reloading nagios 4
Posted: Thu Dec 12, 2013 3:43 am
I'm running nagios 4 CentOS 6 with passive results being submitted to the monitoring host via nsca. If I send a SIGHUP to the daemon to reload its configuration, it appears to do so successfully, but then dies on a SIGSEGV as soon as the first external result arrives. I've included some gdb back traces and log snippets to illustrate the issue. The behavior appears quite consistent. Any ideas would be most welcome. It is a pain to have to restart rather than reload nagios every time my configuration needs to be updated.
These core dumps where generated by nagios 4.0.0, but I have also tried 4.0.2 and it exhibited the same behavior. I've sanitized the server names and ip addresses, but these are otherwise untouched.
These core dumps where generated by nagios 4.0.0, but I have also tried 4.0.2 and it exhibited the same behavior. I've sanitized the server names and ip addresses, but these are otherwise untouched.
Code: Select all
Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259 movdqu (%rdi), %xmm1
(gdb) bt
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1 0x000000000042b17c in find_host_by_name_or_address (name=0x1e4ea80 "clienthost1.example.com")
at commands.c:307
#2 0x000000000042e890 in process_passive_service_check (check_time=1386832389,
host_name=0x1e4ea80 "clienthost1.example.com", svc_description=0x1a46c20 "ssl-sideA",
return_code=2,
output=0x1e4ea30 "CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;")
at commands.c:2215
#3 0x000000000042e7a5 in cmd_process_service_check_result (cmd=30,
check_time=1386832389,
args=0x18488a0 "clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:2188
#4 0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386832389,
args=0x18488a0 "clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:1197
#5 0x000000000042ca6f in process_external_command1 (
cmd=0x7f39c22e1010 "[1386832389] PROCESS_SERVICE_CHECK_RESULT;clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;") at commands.c:884
#6 0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
at commands.c:153
#7 0x000000000048ee19 in iobroker_poll (iobs=0x10ec0c0, timeout=50) at iobroker.c:364
#8 0x0000000000413bab in main (argc=3, argv=0x7fff04659bc8) at nagios.c:662
(gdb)
[1386832389] Caught SIGHUP, restarting...
[1386832390] Event broker module 'NERD' deinitialized successfully.
[1386832390] livestatus: Socket thread has terminated
[1386832390] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386832390] Nagios 4.0.0 starting... (PID=2884)
[1386832390] Local time is Thu Dec 12 17:13:10 EST 2013
[1386832390] LOG VERSION: 2.0
[1386832390] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386832390] qh: core query handler registered
[1386832390] nerd: Channel hostchecks registered successfully
[1386832390] nerd: Channel servicechecks registered successfully
[1386832390] nerd: Channel opathchecks registered successfully
[1386832390] nerd: Fully initialized and ready to rock!
[1386832390] wproc: Successfully registered manager as @wproc with query handler
[1386832390] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost1.example.com;ssl-sideA;2;CRITICAL - DOWN: dbs_prod_web_https4.203.0.113.151.443;
[1386832390] Nagios 4.0.0 starting... (PID=2884)
[1386832390] Local time is Thu Dec 12 17:13:10 EST 2013
[1386832390] LOG VERSION: 2.0
Code: Select all
Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259 movdqu (%rdi), %xmm1
(gdb) bt
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1 0x000000000042b17c in find_host_by_name_or_address (
name=0x1999ac0 "clienthost2.example.com") at commands.c:307
#2 0x000000000042e890 in process_passive_service_check (check_time=1386832436,
host_name=0x1999ac0 "clienthost2.example.com",
svc_description=0x1b167b0 "hostname", return_code=0,
output=0x12e5040 "DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:2215
#3 0x000000000042e7a5 in cmd_process_service_check_result (cmd=30,
check_time=1386832436,
args=0x1b39da0 "clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:2188
#4 0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386832436,
args=0x1b39da0 "clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000") at commands.c:1197
#5 0x000000000042ca6f in process_external_command1 (
cmd=0x7f218d6be010 "[1386832436] PROCESS_SERVICE_CHECK_RESULT;clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.0"...) at commands.c:884
#6 0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
at commands.c:153
#7 0x000000000048ee19 in iobroker_poll (iobs=0xdad0c0, timeout=50) at iobroker.c:364
#8 0x0000000000413bab in main (argc=3, argv=0x7fff51ecf5a8) at nagios.c:662
(gdb)
[1386832436] Caught SIGHUP, restarting...
[1386832436] Event broker module 'NERD' deinitialized successfully.
[1386832436] livestatus: Socket thread has terminated
[1386832436] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386832436] Nagios 4.0.0 starting... (PID=12929)
[1386832436] Local time is Thu Dec 12 17:13:56 EST 2013
[1386832436] LOG VERSION: 2.0
[1386832436] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386832436] qh: core query handler registered
[1386832436] nerd: Channel hostchecks registered successfully
[1386832436] nerd: Channel servicechecks registered successfully
[1386832436] nerd: Channel opathchecks registered successfully
[1386832436] nerd: Fully initialized and ready to rock!
[1386832436] wproc: Successfully registered manager as @wproc with query handler
[1386832436] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost2.example.com;hostname;0;DNS OK - 0.014 seconds response time (3.113.0.203.in-addr.arpa. 690 IN PTR clienthost2.example.com.)|time=0.013594s;;;0.000000
[1386832447] Nagios 4.0.0 starting... (PID=22134)
[1386832447] Local time is Thu Dec 12 17:14:07 EST 2013
[1386832447] LOG VERSION: 2.0
Code: Select all
Core was generated by `/usr/bin/nagios -ud /etc/nagios-cms/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
259 movdqu (%rdi), %xmm1
(gdb) bt
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp.S:259
#1 0x000000000042b17c in find_host_by_name_or_address (
name=0x13f4f40 "clienthost3.example.com") at commands.c:307
#2 0x000000000042e890 in process_passive_service_check (check_time=1386833584,
host_name=0x13f4f40 "clienthost3.example.com",
svc_description=0x1d3fee0 "nsca-submit", return_code=0,
output=0x1d9f870 "OK - submission sent to all 2 nsca servers|") at commands.c:2215
#3 0x000000000042e7a5 in cmd_process_service_check_result (cmd=30,
check_time=1386833584,
args=0x1d8d4d0 "clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|") at commands.c:2188
#4 0x000000000042ceb7 in process_external_command2 (cmd=30, entry_time=1386833584,
args=0x1d8d4d0 "clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|") at commands.c:1197
#5 0x000000000042ca6f in process_external_command1 (
cmd=0x7f8c7b1a7010 "[1386833584] PROCESS_SERVICE_CHECK_RESULT;clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|")
at commands.c:884
#6 0x000000000042acc4 in command_input_handler (sd=35, events=1, discard=0x0)
at commands.c:153
#7 0x000000000048ee19 in iobroker_poll (iobs=0x100f0c0, timeout=50) at iobroker.c:364
#8 0x0000000000413bab in main (argc=3, argv=0x7fff164119c8) at nagios.c:662
(gdb)
[1386833584] Caught SIGHUP, restarting...
[1386833584] Event broker module 'NERD' deinitialized successfully.
[1386833585] livestatus: Socket thread has terminated
[1386833585] Event broker module '/usr/lib/nagios/livestatus.o' deinitialized successfully.
[1386833585] Nagios 4.0.0 starting... (PID=22134)
[1386833585] Local time is Thu Dec 12 17:33:05 EST 2013
[1386833585] LOG VERSION: 2.0
[1386833585] qh: Socket '/var/run/nagios-cms/nagios.qh' successfully initialized
[1386833585] qh: core query handler registered
[1386833585] nerd: Channel hostchecks registered successfully
[1386833585] nerd: Channel servicechecks registered successfully
[1386833585] nerd: Channel opathchecks registered successfully
[1386833585] nerd: Fully initialized and ready to rock!
[1386833585] wproc: Successfully registered manager as @wproc with query handler
[1386833585] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;clienthost3.example.com;nsca-submit;0;OK - submission sent to all 2 nsca servers|
[1386833600] Nagios 4.0.0 starting... (PID=12879)
[1386833600] Local time is Thu Dec 12 17:33:20 EST 2013
[1386833600] LOG VERSION: 2.0