Nagios 4.1.1 on Solaris 11.3 major headaches
Posted: Sun Apr 17, 2016 5:58 pm
I followed a tutorial to install 4.0.2 on Solaris 11(oracle.com) and found a few directory differences which I put down to the fact I was installing 4.1.1
Generally all appeared good but it's not good at all!
I'll try to keep this as concise as possible:
The web interface opens fine (http://localhost/nagios/), Nagios reports it's running with PID 1878, a check on Services shows Localhost Services OK, remote host (Security Server, Server 2008 R2) services show "Critical": Connection refused.
Now a few checks: COTESS-SYSMON is the Nagios host, Security server IP=192.168.0.115
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 192.168.0.115
Connection refused
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 127.0.0.1
Server answer:
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.115
I (0,4,1,73 2012-12-17) seem to be doing fine...
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H localhost
connect to address ::1 port 5666: Connection refused
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 15 services.
Checked 2 hosts.
Checked 2 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
root@COTESS-SYSMON:~# /etc/rc.d/init.d/nagios start
bash: /etc/rc.d/init.d/nagios: No such file or directory
root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root@COTESS-SYSMON:~# svcs -xv
svc:/application/nagios:default (?)
State: maintenance since April 18, 2016 07:39:36 AM AEST
Reason: Start method failed repeatedly, last died on Killed (9).
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.
Solaris gives no indication of a service running with PID 1878, which Nagios claims to be running under. There is no such process shown with the "top" command.
root@COTESS-SYSMON:~# kill 1878
bash: kill: (1878) - No such process
I'm at a real loss here guys, any advice will be greatly appreciated.
Thanks in advance,
Andrew.
Generally all appeared good but it's not good at all!
I'll try to keep this as concise as possible:
The web interface opens fine (http://localhost/nagios/), Nagios reports it's running with PID 1878, a check on Services shows Localhost Services OK, remote host (Security Server, Server 2008 R2) services show "Critical": Connection refused.
Now a few checks: COTESS-SYSMON is the Nagios host, Security server IP=192.168.0.115
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 192.168.0.115
Connection refused
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 127.0.0.1
Server answer:
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.115
I (0,4,1,73 2012-12-17) seem to be doing fine...
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H localhost
connect to address ::1 port 5666: Connection refused
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 15 services.
Checked 2 hosts.
Checked 2 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
root@COTESS-SYSMON:~# /etc/rc.d/init.d/nagios start
bash: /etc/rc.d/init.d/nagios: No such file or directory
root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root@COTESS-SYSMON:~# svcs -xv
svc:/application/nagios:default (?)
State: maintenance since April 18, 2016 07:39:36 AM AEST
Reason: Start method failed repeatedly, last died on Killed (9).
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.
Solaris gives no indication of a service running with PID 1878, which Nagios claims to be running under. There is no such process shown with the "top" command.
root@COTESS-SYSMON:~# kill 1878
bash: kill: (1878) - No such process
I'm at a real loss here guys, any advice will be greatly appreciated.
Thanks in advance,
Andrew.