Page 1 of 1

Nagios shuts down immediately after launching Core Worker

Posted: Fri Aug 21, 2015 10:46 am
by toddmcgoey
I've installed Nagios, but when I try to start it from /etc/init.d/nagios reload or /etc/init.d/nagios start it ultimately fails. It occurs each time, "Caught SIGSEGV, shutting down...", within a few seconds of trying to start. I've reviewed the ownership of the directories and all are set to nagios:nagios for my nagios install at /data/local/nagios-4.1.

Output is as follows:
[1440171255] Nagios 4.1.0rc1 starting... (PID=22689)
[1440171255] Local time is Fri Aug 21 11:34:15 EDT 2015
[1440171255] LOG VERSION: 2.0
[1440171255] qh: Socket '/data/local/nagios-4.1/var/rw/nagios.qh' successfully initialized
[1440171255] qh: core query handler registered
[1440171255] nerd: Channel hostchecks registered successfully
[1440171255] nerd: Channel servicechecks registered successfully
[1440171255] nerd: Channel opathchecks registered successfully
[1440171255] nerd: Fully initialized and ready to rock!
[1440171255] wproc: Successfully registered manager as @wproc with query handler
[1440171255] wproc: Registry request: name=Core Worker 22693;pid=22693
[1440171255] wproc: Registry request: name=Core Worker 22690;pid=22690
[1440171255] wproc: Registry request: name=Core Worker 22694;pid=22694
[1440171255] wproc: Registry request: name=Core Worker 22695;pid=22695
[1440171255] wproc: Registry request: name=Core Worker 22696;pid=22696
[1440171255] wproc: Registry request: name=Core Worker 22697;pid=22697
[1440171255] wproc: Registry request: name=Core Worker 22698;pid=22698
[1440171256] Successfully launched command file worker with pid 22723
[1440171256] wproc: CHECK job 0 from worker Core Worker 22693 died by signal 9 after 0.01 seconds
[1440171256] Caught SIGSEGV, shutting down...

Anyone else experience this?

Re: Nagios shuts down immediately after launching Core Worke

Posted: Fri Aug 21, 2015 11:00 am
by jolson
Can you tell us a bit about your environment?

OS Version:

Code: Select all

cat /etc/*release*
uname -a
SELinux status:

Code: Select all

getenforce
How did you install Nagios Core?

Code: Select all

history

Re: Nagios shuts down immediately after launching Core Worke

Posted: Fri Aug 21, 2015 2:30 pm
by toddmcgoey
Following from the server:
nagios@escdqpvsdvws001 [/data/local/nagios-4.1/var]-> cat /etc/*release*
Oracle Solaris 10 8/11 s10s_u10wos_17b SPARC
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Assembled 23 August 2011

Running configtest:
nagios@escdqpvsdvws001 [/data/local/nagios-4.1/var]-> /etc/init.d/nagios configtest
Nagios Core 4.1.0rc1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 02-18-2015
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 8 services.
Checked 1 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 1 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check
Object precache file created:
/data/local/nagios-4.1/var/objects.precache

Re: Nagios shuts down immediately after launching Core Worke

Posted: Fri Aug 21, 2015 2:31 pm
by toddmcgoey
And:

nagios@escdqpvsdvws001 [/data/local/nagios-4.1/var]-> uname -a
SunOS escdqpvsdvws001 5.10 Generic_150400-09 sun4v sparc sun4v

Re: Nagios shuts down immediately after launching Core Worke

Posted: Fri Aug 21, 2015 2:45 pm
by toddmcgoey
After my initial install, I was able to successfully start nagios. I started it as a 'root' user even though nagios is owned by the user 'nagios'. When i checked for running services, numerous defunct processes were running too. I issued a 'kill -9 8486' on these defunct processes, attempted to stop nagios and start again. Processes have never started properly again since -- whether I try to start as 'nagios' or 'root'

bash-3.2# /etc/init.d/nagios start
Starting nagios: done.
bash-3.2# ps -ef | grep nagios
nagios 8517 8486 0 - ? 0:00 <defunct>
nagios 8505 8486 0 - ? 0:00 <defunct>
nagios 8515 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8506 8486 0 - ? 0:00 <defunct>
nagios 8516 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8769 8486 0 14:31:18 ? 0:00 /data/local/nagios-4.1/bin/nagios -d /data/local/nagios-4.1/etc/nagios.cfg
nagios 8497 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8522 8486 0 - ? 0:00 <defunct>
nagios 8498 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8510 8486 0 - ? 0:00 <defunct>
nagios 8521 8486 0 - ? 0:00 <defunct>
root 9195 10369 0 14:31:23 pts/1 0:00 grep nagios
nagios 8514 8486 0 - ? 0:00 <defunct>
nagios 8494 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8512 8486 0 - ? 0:00 <defunct>
nagios 8499 8486 0 - ? 0:00 <defunct>
nagios 8503 8486 0 - ? 0:00 <defunct>
nagios 8486 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios -d /data/local/nagios-4.1/etc/nagios.cfg
nagios 8507 8486 0 - ? 0:00 <defunct>
nagios 8508 8486 0 - ? 0:00 <defunct>
nagios 8502 8486 0 - ? 0:00 <defunct>
nagios 8493 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8519 8486 0 - ? 0:00 <defunct>
nagios 8501 8486 0 - ? 0:00 <defunct>
nagios 8520 8486 0 - ? 0:00 <defunct>
nagios 8509 8486 0 - ? 0:00 <defunct>
nagios 8495 8486 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
bash-3.2# kill -9 8486
bash-3.2# ps -ef | grep nagios
nagios 8515 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8516 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8497 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8498 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8494 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8493 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
nagios 8495 2338 0 14:31:15 ? 0:00 /data/local/nagios-4.1/bin/nagios --worker /data/local/nagios-4.1/var/rw/nagios
root 12421 10369 0 14:32:03 pts/1 0:00 grep nagios

Re: Nagios shuts down immediately after launching Core Worke

Posted: Mon Aug 24, 2015 9:41 am
by jdalrymple
The Sparc footprint isn't very big so it's unlikely that anyone else has ran across this yet, unless it's not a problem specific to 4.1

The best advice I'd have at this point is look through your compiler's output and see if there were any warnings or errors that sounded particularly discouraging. You could also turn debugging up in nagios.cfg and see if it offers any information - I don't think it will though.

There are some instructions on Oracle's site for building on Oracle Solaris - I'd look there for anything that applies.

http://www.oracle.com/technetwork/artic ... 79071.html