Anybody had any success to get Nagios 4.2.0 running in a Solaris 11 SPARC environment? It did compile fine with GCC 4.8.2, but when I do a config check it spits out a "Bus Error":
# ./bin/nagios -v ./etc/nagios.cfg
Reading configuration data...
Read main config file okay...
Bus Error
This is clean install from scratch the I am testing with and configured/compiled the source as follows:
./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd \
--with-httpd-conf=/etc/apache2/2.4/conf.d \
--with-gd-inc=/usr/include/gd2 \
--disable-statuswrl
This setup are working fine with Nagios 3.5.1. Any ideas will be welcome.
Regards
Solaris 11 and Nagios 4.2.0
Solaris 11 and Nagios 4.2.0
Last edited by dwhitfield on Fri Nov 18, 2016 10:20 am, edited 1 time in total.
Reason: marking with green check mark
Reason: marking with green check mark
Re: Solaris 11 and Nagios 4.2.0
Oracle has some official documentation for Solaris 11:
http://www.oracle.com/technetwork/artic ... 79071.html
Could you run through their steps and let us know if you encounter any issues along the way?
http://www.oracle.com/technetwork/artic ... 79071.html
Could you run through their steps and let us know if you encounter any issues along the way?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Solaris 11 and Nagios 4.2.0
Thanks for the pointer, I have had a look at the Oracle documentation and gave it a go, even if it was for Version 4.0.2.
First tried to compile with Sun Studio 12.3, this proved to be a near impossible task. So reverted to GCC, "gcc version 4.8.2 (GCC)", as the configure script seems to be optimized for GCC.
As the "structure" issue seems to be resolved since Verison 4.1.1 as quoted in the Oracle document:
"this version of Nagios defines a structure (struct comment) that conflicts with a system structure of the same name in /usr/include/sys/pwd.h. Perform the following steps to fix this issue."
I did not proceed with the changes mentioned in the source and make files.
Run ./configure With the following switches:
./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd \
--with-httpd-conf=/etc/apache2/2.4/conf.d \
--with-gd-inc=/usr/include/gd2 \
--disable-statuswrl
Configure runs fine and produces the following summary:
*** Configuration summary for nagios 4.2.0 08-01-2016 ***:
General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagcmd
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Install ${includedir}: /usr/local/nagios/include/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/init.d
Apache conf.d directory: /etc/apache2/2.4/conf.d
Mail program: /usr/bin/mail
Host OS: solaris2.11
IOBroker Method: poll
Web Interface Options:
------------------------
HTML URL: http://localhost/nagios/
CGI URL: http://localhost/nagios/cgi-bin/
Traceroute (used by WAP): /usr/bin/traceroute
Running "gmake all" compiles all the various components with GCC with no errors and only producing the following warning:
workers.c: In function 'wproc_run_job':
workers.c6: warning: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'ssize_t' [-Wformat=]
wp->name, ret, kvvb->bufsize, written, errno, strerror(errno));
^
After a install of all the files in /usr/local/nagios, I still get the "Bus Error" when doing a verification on the default sample config files:
:nagios# ./bin/nagios -v ./etc/nagios.cfg
Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Bus Error
Anything else I can do from my side to shed some more light on the "Bus Error" problem?
Regards
First tried to compile with Sun Studio 12.3, this proved to be a near impossible task. So reverted to GCC, "gcc version 4.8.2 (GCC)", as the configure script seems to be optimized for GCC.
As the "structure" issue seems to be resolved since Verison 4.1.1 as quoted in the Oracle document:
"this version of Nagios defines a structure (struct comment) that conflicts with a system structure of the same name in /usr/include/sys/pwd.h. Perform the following steps to fix this issue."
I did not proceed with the changes mentioned in the source and make files.
Run ./configure With the following switches:
./configure --prefix=/usr/local/nagios \
--with-command-group=nagcmd \
--with-httpd-conf=/etc/apache2/2.4/conf.d \
--with-gd-inc=/usr/include/gd2 \
--disable-statuswrl
Configure runs fine and produces the following summary:
*** Configuration summary for nagios 4.2.0 08-01-2016 ***:
General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagcmd
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Install ${includedir}: /usr/local/nagios/include/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/init.d
Apache conf.d directory: /etc/apache2/2.4/conf.d
Mail program: /usr/bin/mail
Host OS: solaris2.11
IOBroker Method: poll
Web Interface Options:
------------------------
HTML URL: http://localhost/nagios/
CGI URL: http://localhost/nagios/cgi-bin/
Traceroute (used by WAP): /usr/bin/traceroute
Running "gmake all" compiles all the various components with GCC with no errors and only producing the following warning:
workers.c: In function 'wproc_run_job':
workers.c6: warning: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'ssize_t' [-Wformat=]
wp->name, ret, kvvb->bufsize, written, errno, strerror(errno));
^
After a install of all the files in /usr/local/nagios, I still get the "Bus Error" when doing a verification on the default sample config files:
:nagios# ./bin/nagios -v ./etc/nagios.cfg
Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Bus Error
Anything else I can do from my side to shed some more light on the "Bus Error" problem?
Regards
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Solaris 11 and Nagios 4.2.0
I suspect the issue you are having is going to be fixed in the next version.
I think the main branch has the update which resovles this issue:
https://github.com/NagiosEnterprises/na ... tree/maint
I am in the process of putting together installation documentation which will include Solaris, it's probably a month away from being published.
I think the main branch has the update which resovles this issue:
https://github.com/NagiosEnterprises/na ... tree/maint
I am in the process of putting together installation documentation which will include Solaris, it's probably a month away from being published.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Solaris 11 and Nagios 4.2.0
Thank you for the feedback and I am looking forward to the Solaris install documentation.
Did some more digging around my side and ran truss against the nagios binary as follows:
/usr/local/nagios:# truss ./bin/nagios -v ./etc/nagios.cfg
This produced some more detail on the Bus Error, extract of interest below, which seem to indicate an issue reading in the default localhost.cfg file:
open("/usr/local/nagios/etc/objects/localhost.cfg", O_RDONLY) = 6
fstat(6, 0xFFBFF3B8) = 0
mmap(0x00000000, 5379, PROT_READ, MAP_PRIVATE, 6, 0) = 0xFEF90000
munmap(0xFEF90000, 5379) = 0
close(6) = 0
munmap(0xFEFA0000, 44831) = 0
close(5) = 0
Incurred fault #5, FLTACCESS %pc = 0x000A1288
siginfo: SIGBUS BUS_ADRALN addr=0x00000021
Received signal #10, SIGBUS [default]
siginfo: SIGBUS BUS_ADRALN addr=0x00000021
If I make an empty localhost.cfg file I get the same error. For reference I have a attached the complete output from the truss command.
Did some more digging around my side and ran truss against the nagios binary as follows:
/usr/local/nagios:# truss ./bin/nagios -v ./etc/nagios.cfg
This produced some more detail on the Bus Error, extract of interest below, which seem to indicate an issue reading in the default localhost.cfg file:
open("/usr/local/nagios/etc/objects/localhost.cfg", O_RDONLY) = 6
fstat(6, 0xFFBFF3B8) = 0
mmap(0x00000000, 5379, PROT_READ, MAP_PRIVATE, 6, 0) = 0xFEF90000
munmap(0xFEF90000, 5379) = 0
close(6) = 0
munmap(0xFEFA0000, 44831) = 0
close(5) = 0
Incurred fault #5, FLTACCESS %pc = 0x000A1288
siginfo: SIGBUS BUS_ADRALN addr=0x00000021
Received signal #10, SIGBUS [default]
siginfo: SIGBUS BUS_ADRALN addr=0x00000021
If I make an empty localhost.cfg file I get the same error. For reference I have a attached the complete output from the truss command.
Re: Solaris 11 and Nagios 4.2.0
Might be memory alignment?
https://www.litespeedtech.com/support/f ... rashes.75/
Our Core dev would need to look at this. I'll send it his way and see what he thinks.
https://www.litespeedtech.com/support/f ... rashes.75/
Our Core dev would need to look at this. I'll send it his way and see what he thinks.
Former Nagios employee
Re: Solaris 11 and Nagios 4.2.0
FYI, I posted the issue on GitHub - https://github.com/NagiosEnterprises/na ... issues/285 to make sure it is not going to "fall in the cracks". Thank you!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Solaris 11 and Nagios 4.2.0
I have the fix in the 'maint' brance of nagios core (https://github.com/NagiosEnterprises/na ... tree/maint) via commit https://github.com/NagiosEnterprises/na ... 74340c63de.
Edit: Sorry, wrong forum post. The problem mentioned here is not resolved yet. See my next message.
Edit: Sorry, wrong forum post. The problem mentioned here is not resolved yet. See my next message.
Re: Solaris 11 and Nagios 4.2.0
I compiled nagios core on a Solaris 11 Spark system and ran it (with and without the '-v') as a regular user. I did not get a SIGBUS error. I used version 4.2.1, and compiled with gcc 4.9.2. It's a slightly different version of core, and a different version of gcc, so that may or may not have something to do with it.
Your truss output shows two instances of munmap and close, so that means it has completed reading the config files. The localhost.cfg is just the last one it reads.
siginfo: SIGBUS BUS_ADRALN addr=0x00000021 -- That looks like an invalid memory address. So it's probably not an alignment issue.
I would like you to try core 4.2.1. After configuring and making, instead of running make install, run make install-unstripped, so the executable retains debug info. Change your nagios.cfg file to enable core dumps (set daemon_dumps_core=1 near the bottom.) Then run nagios with the "-v" option again, as a user other than root. If it still has problems, find and upload the core dump file so I can take a look. Also try running under "truss" and upload that as well. With the debug info in the executable, both the truss output and the core file should provide better information.
Your truss output shows two instances of munmap and close, so that means it has completed reading the config files. The localhost.cfg is just the last one it reads.
siginfo: SIGBUS BUS_ADRALN addr=0x00000021 -- That looks like an invalid memory address. So it's probably not an alignment issue.
I would like you to try core 4.2.1. After configuring and making, instead of running make install, run make install-unstripped, so the executable retains debug info. Change your nagios.cfg file to enable core dumps (set daemon_dumps_core=1 near the bottom.) Then run nagios with the "-v" option again, as a user other than root. If it still has problems, find and upload the core dump file so I can take a look. Also try running under "truss" and upload that as well. With the debug info in the executable, both the truss output and the core file should provide better information.
Re: Solaris 11 and Nagios 4.2.0
Thanks, got nagios 4.2.1 compiled and running stable. Even when doing ./bin/nagios -c ./etc/nagois.cfg as root it with no "Bus Error". The daemon is stable and running, but I do observe an increase in system load from an average of 2, to and average of between 8 and 9
Also see a lot of <defunct> processes when doing a "ps -fu nagios", output attached, this a bit of a concern. When doing ptree on one of the <defunct> processes it look like this:
/$ ptree 29058
4123 zsched
27126 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
27127 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios
29058 <defunct>
The "check_workers" parameter I have set to "8" in nagios.cfg, otherwise the worker processes seem to ran away if it need to dynamically allocated based on 1.5 * number of cpu's. Attached also the truss output from one of the worker PID's. for reference.
Thanks for all the assistance so far
Jaco Lesch
SAIX HLS
Also see a lot of <defunct> processes when doing a "ps -fu nagios", output attached, this a bit of a concern. When doing ptree on one of the <defunct> processes it look like this:
/$ ptree 29058
4123 zsched
27126 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
27127 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios
29058 <defunct>
The "check_workers" parameter I have set to "8" in nagios.cfg, otherwise the worker processes seem to ran away if it need to dynamically allocated based on 1.5 * number of cpu's. Attached also the truss output from one of the worker PID's. for reference.
Thanks for all the assistance so far
Jaco Lesch
SAIX HLS
- Attachments
-
- truss-ng-worker.txt
- Truss trace output from a nagios worker process
- (1.15 MiB) Downloaded 390 times
-
- ps-nagios.txt
- Output from "ps -fu nagios"
- (4.58 KiB) Downloaded 373 times