Page 1 of 1

Monitoring Engine stops working

Posted: Tue Mar 28, 2017 12:10 pm
by tongchenkuo
We are having VM host server problem that causes the / becomes Read-Only status.
After cold restart, the Monitoring Engine stop working.
--------------------------------------------------------------------------------------------------------------------
OS: CentOS Linux release 7.2.1511 (3.10.0-327.28.2.el7.x86_64)
Nagios XI 5.3.3 manual install
Gnome installed, no proxy, no SSL
All other components (Performance Grapher, Database Backend, etc.) are all green lights
--------------------------------------------------------------------------------------------------------------------

- Execute /usr/local/nagiosxi/scripts/repair_databases.sh completed.

- Trying to upgrade to 5.4.3 failed on both manual and auto update since Monitoring engine not working.

- systemctl restart nagios
Job for nagios.service failed because a configured resource limit was exceeded. See "systemctl status nagios.service" and "journalctl -xe" for details.

- systemctl status nagios

nagios.service - LSB: Starts and stops the Nagios monitoring server
Loaded: loaded (/etc/rc.d/init.d/nagios; bad; vendor preset: disabled)
Active: failed (Result: resources) since Tue 2017-03-28 12:55:58 EDT; 52s ago
Docs: man:systemd-sysv-generator(8)
Process: 62162 ExecStart=/etc/rc.d/init.d/nagios start (code=exited, status=0/SUCCESS)

Mar 28 12:55:56 appprd01nagios.corp.unifirst.com nagios[62192]: wproc: Registry request: name=Core Worker 62208;pid=62208
Mar 28 12:55:56 appprd01nagios.corp.unifirst.com nagios[62192]: wproc: Registry request: name=Core Worker 62211;pid=62211
Mar 28 12:55:56 appprd01nagios.corp.unifirst.com nagios[62192]: wproc: Registry request: name=Core Worker 62212;pid=62212
Mar 28 12:55:56 appprd01nagios.corp.unifirst.com nagios[62192]: wproc: Registry request: name=Core Worker 62213;pid=62213
Mar 28 12:55:56 appprd01nagios.corp.unifirst.com nagios[62162]: Starting nagios: done.
Mar 28 12:55:56 appprd01nagios.corp.unifirst.com systemd[1]: PID 62192 read from file /usr/local/nagios/var/nagios.lock does not exist or i...ombie.
Mar 28 12:55:58 appprd01nagios.corp.unifirst.com systemd[1]: nagios.service never wrote its PID file. Failing.
Mar 28 12:55:58 appprd01nagios.corp.unifirst.com systemd[1]: Failed to start LSB: Starts and stops the Nagios monitoring server.
Mar 28 12:55:58 appprd01nagios.corp.unifirst.com systemd[1]: Unit nagios.service entered failed state.
Mar 28 12:55:58 appprd01nagios.corp.unifirst.com systemd[1]: nagios.service failed.

- /usr/local/nagios/var/nagios.log

[1490720156] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1490720156] ndomod: I've been compiled with support for revision 402 of the internal Nagios object structures, but the Nagios daemon is currently using revision 403. I'm going to unload so I don't cause any problems...
[1490720156] Error: Function nebmodule_init() in module '/usr/local/nagios/bin/ndomod.o' returned an error. Module will be unloaded.
[1490720156] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1490720156] Error: Failed to load module '/usr/local/nagios/bin/ndomod.o'.
[1490720156] Error: Module loading failed. Aborting.

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 12:39 pm
by avandemore
For some reason it looks like you have a mismatch in ndo vs core. Can you try upgrading to the latest?

https://assets.nagios.com/downloads/nag ... nstall.pdf

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 2:46 pm
by tongchenkuo
I have been tried to upgrade many times but still got this error message

make[1]: Leaving directory `/tmp/nagiosxi/subcomponents/nagioscore/nagios-4.2.4'
Warning: nagios.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Job for nagios.service failed because a configured resource limit was exceeded. See "systemctl status nagios.service" and "journalctl -xe" for details.

There is no error message in the upgrade.log file, the last couple lines in the log file are

*** Main program, CGIs and HTML files installed ***

You can continue with installing Nagios as follows (type 'make'
without any arguments for a list of all possible options):

make install-init
- This installs the init script in /etc/rc.d/init.d

make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file

make install-config
- This installs sample config files in /usr/local/nagios/etc

make[1]: Leaving directory `/tmp/nagiosxi/subcomponents/nagioscore/nagios-4.2.4'

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 3:09 pm
by scottwilkerson
I've not seen this before, but lets try forcing the upgrade of ndoutils to get that fixed first

Code: Select all

cd /tmp/nagiosxi/subcomponents/ndoutils
./upgrade
Then lets try starting nagios.

If it starts we can try continuing the upgrade

Code: Select all

cd /tmp/nagiosxi
./upgrade

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 3:22 pm
by tongchenkuo
Failed on cd /tmp/nagiosxi/subcomponents/ndoutils
./upgrade

*** Configuration summary for ndoutils 2.1.2 11-14-2016 ***:

General Options:
-------------------------
NDO2DB user: nagios
NDO2DB group: nagios
NDO2DB tcp port: 5668


Review the options above for accuracy. If they look
okay, type 'make all' to compile the NDO utilities,
or type 'make' to get a list of make options.

cd ./src && make
make[1]: Entering directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.2/src'
gcc -fPIC -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -c -o io.o io.c
gcc -fPIC -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -c -o utils.o utils.c
gcc -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -o file2sock file2sock.c io.o utils.o -lsystemd -lm -lnsl
gcc -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -o log2ndo log2ndo.c io.o utils.o -lsystemd -lm -lnsl
make ndo2db-2x
make[2]: Entering directory `/tmp/nagiosxi/subcomponents/ndoutils/ndoutils-2.1.2/src'
gcc -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -c -o db.o db.c
gcc -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -D BUILD_NAGIOS_2X -c -o dbhandlers-2x.o dbhandlers.c
gcc -fPIC -g -O2 -I/usr/include/mysql -DHAVE_CONFIG_H -D BUILD_NAGIOS_2X -o ndo2db-2x queue.c ndo2db.c dbhandlers-2x.o io.o utils.o db.o -lsystemd -lnsl -L/usr/lib64/mysql -lmysqlclient -lpthread -lz -lm -lssl -lcrypto -ldl -lm
ndo2db.c:44:31: fatal error: systemd/sd_daemon.h: No such file or directory
#include <systemd/sd_daemon.h>

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 3:30 pm
by scottwilkerson
Please run the following

Code: Select all

yum install systemd-devel -y
Then try the above again

Re: Monitoring Engine stops working

Posted: Tue Mar 28, 2017 3:40 pm
by jfrickson
Check your ndo2db.service and nagios.service files -- probably in a diretory something like /usr/lib/systemd/system/. There may be an entry in there that says either ProtectSystem=yes or ProtectSystem=full. If there is, either delete the line or set it to ProtectSystem=no. Systemd recently enabled those options and caused problems for quite a few systems.

I don't know why you're getting the fatal error: systemd/sd_daemon.h: No such file or directory error. You could try commenting it out and see if that works.

EDIT: Better yet, #undef HAVE_SYSTEMD in include/config.h

EDIT2: Or take Scott's suggestion :D

Re: Monitoring Engine stops working

Posted: Wed Mar 29, 2017 9:17 am
by tongchenkuo
The problem is solved.

It looks like there is a typo error in the 5.4.3 upgrade program

From Scott instruction,
cd /tmp/nagiosxi/subcomponents/ndoutils
./upgrade

I got error : ndo2db.c:44:31: fatal error: systemd/sd_daemon.h: No such file or directory

Then I found the systemd/sd_daemon.h should be systemd/sd-daemon.h

So what I do is

1. cp /usr/include/systemd/sd-daemon.h /usr/include/systemd/sd_daemon.h
2. cd /tmp/nagiosxi/subcomponents/ndoutils
3. ./upgrade

The Monitoring Engine is started. :D

Continue do the 5.4.3 upgrade without problem.

cd /tmp/nagiosxi
./upgrade

Thanks everyone

Re: Monitoring Engine stops working

Posted: Wed Mar 29, 2017 9:26 am
by cdienger
Glad to hear : ) I assume we can close the thread at this point?

Re: Monitoring Engine stops working

Posted: Thu Mar 30, 2017 6:58 am
by tongchenkuo
Yes, this thread can be closed. Thanks,