Page 1 of 2

Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Sun Apr 17, 2016 5:58 pm
by ruffy01
I followed a tutorial to install 4.0.2 on Solaris 11(oracle.com) and found a few directory differences which I put down to the fact I was installing 4.1.1
Generally all appeared good but it's not good at all!
I'll try to keep this as concise as possible:

The web interface opens fine (http://localhost/nagios/), Nagios reports it's running with PID 1878, a check on Services shows Localhost Services OK, remote host (Security Server, Server 2008 R2) services show "Critical": Connection refused.

Now a few checks: COTESS-SYSMON is the Nagios host, Security server IP=192.168.0.115

root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 192.168.0.115
Connection refused
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 127.0.0.1
Server answer:

root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.115
I (0,4,1,73 2012-12-17) seem to be doing fine...
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H localhost
connect to address ::1 port 5666: Connection refused
CHECK_NRPE: Error - Could not complete SSL handshake.

root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 15 services.
Checked 2 hosts.
Checked 2 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check
root@COTESS-SYSMON:~# /etc/rc.d/init.d/nagios start
bash: /etc/rc.d/init.d/nagios: No such file or directory
root@COTESS-SYSMON:~# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root@COTESS-SYSMON:~# svcs -xv
svc:/application/nagios:default (?)
State: maintenance since April 18, 2016 07:39:36 AM AEST
Reason: Start method failed repeatedly, last died on Killed (9).
See: http://support.oracle.com/msg/SMF-8000-KS
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

Solaris gives no indication of a service running with PID 1878, which Nagios claims to be running under. There is no such process shown with the "top" command.
root@COTESS-SYSMON:~# kill 1878
bash: kill: (1878) - No such process

I'm at a real loss here guys, any advice will be greatly appreciated.
Thanks in advance,
Andrew.

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Mon Apr 18, 2016 10:36 am
by lmiltchev
I followed a tutorial to install 4.0.2 on Solaris 11(oracle.com) and found a few directory differences which I put down to the fact I was installing 4.1.1
Generally all appeared good but it's not good at all!
Can you provide us with a URL link to the tutorial that you followed?

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Mon Apr 18, 2016 4:14 pm
by ruffy01
Sure can. This one here:
http://www.oracle.com/technetwork/artic ... 79071.html

I used the GNU compiler method. As I said above a few files that needed to be edited were in different directories to what was listed and occasionally didn't need to be edited, I assumed that was because I was compiling 4.1.1 not 4.0.2.

Thanks for the reply.
Please note; I'm a bit of a Unix newbie.

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Mon Apr 18, 2016 5:17 pm
by rkennedy
It looks like there are two issues here. The first being with plugins, and the second with the PID not being killed. Just to clarify -- is your Nagios running smoothly, and just the service checks failing?
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 192.168.0.115
Connection refused
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 127.0.0.1
Server answer:

root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.115
I (0,4,1,73 2012-12-17) seem to be doing fine...
You mentioned 192.168.0.115 being the Nagios host, but your check indicates it's actually a Windows machine running NSClient++ 0.4.1. This is why your SSH is getting connection refused. Please run nmap 192.168.0.115.

The second part, check_ssh -H 127.0.0.1 should be working fine. What is the output of sshd -h from the Nagios host?
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H localhost
connect to address ::1 port 5666: Connection refused
CHECK_NRPE: Error - Could not complete SSL handshake.
Did you compile Nagios plugins with SSL support? What is the output of running /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n? The -n flag will run without SSL.

The second check, against localhost suggests it's failing because it's routing over IPv6. Can you take a look at your /etc/hosts and make sure IPv4 is resolvable?

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Mon Apr 18, 2016 6:27 pm
by ruffy01
Sorry my wording was misleading; COTESS-SYSMON is the Nagios host (IP=192.168.0.33), Security server is the remote client, IP=192.168.0.115, so, yes you're right rkennedy.

I must apologize, I'm not at work today.
I'll be in front of the server from 7:30am tomorrow (22 hours from now, in case we're in different time zones :) ) and I hope you guys can continue with the help.

Greatly appreciate your replies,
Andrew.

I shall run "nmap 192.168.0.115", "/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n" and "sshd -h" before posting tomorrow :)

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Tue Apr 19, 2016 9:34 am
by rkennedy
Sounds good - we will watch for your update.

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Tue Apr 19, 2016 5:17 pm
by ruffy01
rkennedy wrote:It looks like there are two issues here. The first being with plugins, and the second with the PID not being killed. Just to clarify -- is your Nagios running smoothly, and just the service checks failing?.....Nagios appears to be running OK. EDIT: Not really, it displays as though it's running but I've found it doesn't respond to any commands from the WEB UI. I believe it may load long enough to display data but then is in fact shutdown. It's very strange. The WEB UI says running with PID xxxx, "svcs -xv" says the service isn't running, it's in a state of "maintenance".


You mentioned 192.168.0.115 being the Nagios host, but your check indicates it's actually a Windows machine running NSClient++ 0.4.1. This is why your SSH is getting connection refused. Please run nmap 192.168.0.115.
root@COTESS-SYSMON:~# nmap 192.168.0.115

Starting Nmap 6.25 ( http://nmap.org ) at 2016-04-20 07:46 AEST
Nmap scan report for security.cote.local (192.168.0.115)
Host is up (0.00034s latency).
Not shown: 986 closed ports
PORT STATE SERVICE
80/tcp open http
135/tcp open msrpc
139/tcp open netbios-ssn
443/tcp open https
445/tcp open microsoft-ds
1311/tcp open rxmon
1433/tcp open ms-sql-s
2179/tcp open vmrdp
2383/tcp open ms-olap4
3389/tcp open ms-wbt-server
5666/tcp open nrpe
49152/tcp open unknown
49153/tcp open unknown
49154/tcp open unknown
MAC Address: 00:19:B9:EF:1E:C8 (Dell)



The second part, check_ssh -H 127.0.0.1 should be working fine. What is the output of sshd -h from the Nagios host?

root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_ssh -H 127.0.0.1
Server answer: root@COTESS-SYSMON:~#

root@COTESS-SYSMON:~# sshd -h
bash: sshd: command not found
.....Do I need to change directories for this command to work?
I tried this:
root@COTESS-SYSMON:~# svcs ssh
STATE STIME FMRI
online 7:36:53 svc:/network/ssh:default



Did you compile Nagios plugins with SSL support? What is the output of running /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n? The -n flag will run without SSL.
root@COTESS-SYSMON:~# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
.....Reading back over the install tutorial, I can't see SSL support being configured, but again, I am a Unix newbie.


The second check, against localhost suggests it's failing because it's routing over IPv6. Can you take a look at your /etc/hosts and make sure IPv4 is resolvable?
"/etc/inet/hosts" exists...."/etc/hosts" is non-existent
#
# Copyright 2009 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
# Internet host table
#
127.0.0.1 COTESS-SYSMON localhost loghost
::1 COTESS-SYSMON localhost
Hopefully there is some helpful info there.
Thanks,
Andrew

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Wed Apr 20, 2016 4:53 am
by ruffy01
After 6.5 hours of totally unproductive work today trying to get this setup functioning, I'm thinking (my boss is thinking!) maybe a different OS would be the way to go.
Solaris 11.3 appears to be a little flakey on our hardware, combined with Nagios issues....

Any advice either way will be greatly appreciated. Bottom line is I can't afford too much more time on this configuration.
We have a Dell R300 which will be solely configured to host Nagios on a Unix/Linux platform to monitor several Windows physical servers & several Hyper-V servers (Windows also).
It must be rock solid & reliable. The monitoring is required for compliance regulations relevant to the industry my client operates in.

Cheers,
Andrew.

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Wed Apr 20, 2016 11:13 am
by rkennedy
I'd recommend Centos 7. I wrote this guide, which should work for you without issues.
https://assets.nagios.com/downloads/nag ... entos7.pdf

As for the deal with your plugins, I suspect NRPE wasn't working because plugins were compiled without SSL. This should work with ease, once things are setup properly.

Another option, is to use our enterprise product (https://www.nagios.com/products/nagios-xi/), and deploy the OVA file straight to your VM infrustructure. You could also do a source install on your R300 if you wanted to stick with a bare metal system.

Re: Nagios 4.1.1 on Solaris 11.3 major headaches

Posted: Wed Apr 20, 2016 5:31 pm
by ruffy01
rkennedy wrote:I'd recommend Centos 7. I wrote this guide, which should work for you without issues.
https://assets.nagios.com/downloads/nag ... entos7.pdf

As for the deal with your plugins, I suspect NRPE wasn't working because plugins were compiled without SSL. This should work with ease, once things are setup properly.

Another option, is to use our enterprise product (https://www.nagios.com/products/nagios-xi/), and deploy the OVA file straight to your VM infrustructure. You could also do a source install on your R300 if you wanted to stick with a bare metal system.
Thanks rkennedy.
Do you have a link to recompile the plugins using SSL? I'd like to give that a quick look.
If not, then Centos it is :)

Cheers,
Andrew.