Solaris 11 and Nagios 4.2.0

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
jacolza
Posts: 7
Joined: Fri Aug 26, 2016 6:51 am

Solaris 11 and Nagios 4.2.0

Post by jacolza »

Anybody had any success to get Nagios 4.2.0 running in a Solaris 11 SPARC environment? It did compile fine with GCC 4.8.2, but when I do a config check it spits out a "Bus Error":
# ./bin/nagios -v ./etc/nagios.cfg
Reading configuration data...
Read main config file okay...
Bus Error

This is clean install from scratch the I am testing with and configured/compiled the source as follows:
./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd \
--with-httpd-conf=/etc/apache2/2.4/conf.d \
--with-gd-inc=/usr/include/gd2 \
--disable-statuswrl

This setup are working fine with Nagios 3.5.1. Any ideas will be welcome.

Regards
Last edited by dwhitfield on Fri Nov 18, 2016 10:20 am, edited 1 time in total.
Reason: marking with green check mark
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Solaris 11 and Nagios 4.2.0

Post by mcapra »

Oracle has some official documentation for Solaris 11:
http://www.oracle.com/technetwork/artic ... 79071.html

Could you run through their steps and let us know if you encounter any issues along the way?
Former Nagios employee
https://www.mcapra.com/
jacolza
Posts: 7
Joined: Fri Aug 26, 2016 6:51 am

Re: Solaris 11 and Nagios 4.2.0

Post by jacolza »

Thanks for the pointer, I have had a look at the Oracle documentation and gave it a go, even if it was for Version 4.0.2.

First tried to compile with Sun Studio 12.3, this proved to be a near impossible task. So reverted to GCC, "gcc version 4.8.2 (GCC)", as the configure script seems to be optimized for GCC.

As the "structure" issue seems to be resolved since Verison 4.1.1 as quoted in the Oracle document:
"this version of Nagios defines a structure (struct comment) that conflicts with a system structure of the same name in /usr/include/sys/pwd.h. Perform the following steps to fix this issue."
I did not proceed with the changes mentioned in the source and make files.

Run ./configure With the following switches:
./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd \
--with-httpd-conf=/etc/apache2/2.4/conf.d \
--with-gd-inc=/usr/include/gd2 \
--disable-statuswrl

Configure runs fine and produces the following summary:
*** Configuration summary for nagios 4.2.0 08-01-2016 ***:

General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagcmd
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Install ${includedir}: /usr/local/nagios/include/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/init.d
Apache conf.d directory: /etc/apache2/2.4/conf.d
Mail program: /usr/bin/mail
Host OS: solaris2.11
IOBroker Method: poll

Web Interface Options:
------------------------
HTML URL: http://localhost/nagios/
CGI URL: http://localhost/nagios/cgi-bin/
Traceroute (used by WAP): /usr/bin/traceroute


Running "gmake all" compiles all the various components with GCC with no errors and only producing the following warning:
workers.c: In function 'wproc_run_job':
workers.c:1053:6: warning: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'ssize_t' [-Wformat=]
wp->name, ret, kvvb->bufsize, written, errno, strerror(errno));

^

After a install of all the files in /usr/local/nagios, I still get the "Bus Error" when doing a verification on the default sample config files:
:nagios# ./bin/nagios -v ./etc/nagios.cfg

Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Bus Error


Anything else I can do from my side to shed some more light on the "Bus Error" problem?

Regards
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Solaris 11 and Nagios 4.2.0

Post by Box293 »

I suspect the issue you are having is going to be fixed in the next version.

I think the main branch has the update which resovles this issue:

https://github.com/NagiosEnterprises/na ... tree/maint

I am in the process of putting together installation documentation which will include Solaris, it's probably a month away from being published.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jacolza
Posts: 7
Joined: Fri Aug 26, 2016 6:51 am

Re: Solaris 11 and Nagios 4.2.0

Post by jacolza »

Thank you for the feedback and I am looking forward to the Solaris install documentation.

Did some more digging around my side and ran truss against the nagios binary as follows:
/usr/local/nagios:# truss ./bin/nagios -v ./etc/nagios.cfg

This produced some more detail on the Bus Error, extract of interest below, which seem to indicate an issue reading in the default localhost.cfg file:
open("/usr/local/nagios/etc/objects/localhost.cfg", O_RDONLY) = 6
fstat(6, 0xFFBFF3B8) = 0
mmap(0x00000000, 5379, PROT_READ, MAP_PRIVATE, 6, 0) = 0xFEF90000
munmap(0xFEF90000, 5379) = 0
close(6) = 0
munmap(0xFEFA0000, 44831) = 0
close(5) = 0
Incurred fault #5, FLTACCESS %pc = 0x000A1288
siginfo: SIGBUS BUS_ADRALN addr=0x00000021
Received signal #10, SIGBUS [default]
siginfo: SIGBUS BUS_ADRALN addr=0x00000021


If I make an empty localhost.cfg file I get the same error. For reference I have a attached the complete output from the truss command.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Solaris 11 and Nagios 4.2.0

Post by tmcdonald »

Might be memory alignment?

https://www.litespeedtech.com/support/f ... rashes.75/

Our Core dev would need to look at this. I'll send it his way and see what he thinks.
Former Nagios employee
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Solaris 11 and Nagios 4.2.0

Post by lmiltchev »

FYI, I posted the issue on GitHub - https://github.com/NagiosEnterprises/na ... issues/285 to make sure it is not going to "fall in the cracks". Thank you!
Be sure to check out our Knowledgebase for helpful articles and solutions!
jfrickson

Re: Solaris 11 and Nagios 4.2.0

Post by jfrickson »

I have the fix in the 'maint' brance of nagios core (https://github.com/NagiosEnterprises/na ... tree/maint) via commit https://github.com/NagiosEnterprises/na ... 74340c63de.

Edit: Sorry, wrong forum post. The problem mentioned here is not resolved yet. See my next message.
jfrickson

Re: Solaris 11 and Nagios 4.2.0

Post by jfrickson »

I compiled nagios core on a Solaris 11 Spark system and ran it (with and without the '-v') as a regular user. I did not get a SIGBUS error. I used version 4.2.1, and compiled with gcc 4.9.2. It's a slightly different version of core, and a different version of gcc, so that may or may not have something to do with it.

Your truss output shows two instances of munmap and close, so that means it has completed reading the config files. The localhost.cfg is just the last one it reads.

siginfo: SIGBUS BUS_ADRALN addr=0x00000021 -- That looks like an invalid memory address. So it's probably not an alignment issue.

I would like you to try core 4.2.1. After configuring and making, instead of running make install, run make install-unstripped, so the executable retains debug info. Change your nagios.cfg file to enable core dumps (set daemon_dumps_core=1 near the bottom.) Then run nagios with the "-v" option again, as a user other than root. If it still has problems, find and upload the core dump file so I can take a look. Also try running under "truss" and upload that as well. With the debug info in the executable, both the truss output and the core file should provide better information.
jacolza
Posts: 7
Joined: Fri Aug 26, 2016 6:51 am

Re: Solaris 11 and Nagios 4.2.0

Post by jacolza »

Thanks, got nagios 4.2.1 compiled and running stable. Even when doing ./bin/nagios -c ./etc/nagois.cfg as root it with no "Bus Error". The daemon is stable and running, but I do observe an increase in system load from an average of 2, to and average of between 8 and 9

Also see a lot of <defunct> processes when doing a "ps -fu nagios", output attached, this a bit of a concern. When doing ptree on one of the <defunct> processes it look like this:
/$ ptree 29058
4123 zsched
27126 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
27127 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios
29058 <defunct>

The "check_workers" parameter I have set to "8" in nagios.cfg, otherwise the worker processes seem to ran away if it need to dynamically allocated based on 1.5 * number of cpu's. Attached also the truss output from one of the worker PID's. for reference.

Thanks for all the assistance so far

Jaco Lesch
SAIX HLS
Attachments
truss-ng-worker.txt
Truss trace output from a nagios worker process
(1.15 MiB) Downloaded 390 times
ps-nagios.txt
Output from "ps -fu nagios"
(4.58 KiB) Downloaded 373 times
Locked