Page 1 of 2

can't get the nagios service to run

Posted: Thu Sep 03, 2020 3:27 pm
by benhank
I have upgraded from version 5.4.12 to version 5.7.2
The nagios service doesn't run:

Code: Select all

[root@lkennagiost01 nagiosxi]# service nagios start
Starting nagios: done.
[root@lkennagiost01 nagiosxi]# service nagios status
nagios is not running
[root@lkennagiost01 nagiosxi]#
I had an issue similar to this a while ago, and i modified to following files to point to the correct location of the nagios.lock file:

Code: Select all

/usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint.sh
/usr/local/nagiosxi/scripts/nom_restore_nagioscore_checkpoint_specific.sh
/usr/local/nagios/etc/nagios.cfg
/etc/rc.d/init.d/nagios
are all set to :

Code: Select all

/var/run/nagios.lock
The file is there in the directory and has a pid.

Code: Select all

[root@lkennagiost01 ~]# tail -20 /usr/local/nagios/var/nagios.log
[1599236832] wproc: Registry request: name=Core Worker 19147;pid=19147
[1599236832] wproc: Registry request: name=Core Worker 19148;pid=19148
[1599236832] wproc: Registry request: name=Core Worker 19149;pid=19149
[1599236832] wproc: Registry request: name=Core Worker 19150;pid=19150
[1599236832] wproc: Registry request: name=Core Worker 19151;pid=19151
[1599236832] wproc: Registry request: name=Core Worker 19152;pid=19152
[1599236832] wproc: Registry request: name=Core Worker 19153;pid=19153
[1599236832] wproc: Registry request: name=Core Worker 19154;pid=19154
[1599236832] wproc: Registry request: name=Core Worker 19155;pid=19155
[1599236832] wproc: Registry request: name=Core Worker 19156;pid=19156
[1599236832] wproc: Registry request: name=Core Worker 19157;pid=19157
[1599236832] wproc: Registry request: name=Core Worker 19158;pid=19158
[1599236832] wproc: Registry request: name=Core Worker 19159;pid=19159
[1599236832] wproc: Registry request: name=Core Worker 19160;pid=19160
[1599236832] wproc: Registry request: name=Core Worker 19161;pid=19161
[1599236832] wproc: Registry request: name=Core Worker 19162;pid=19162
[1599236832] wproc: Registry request: name=Core Worker 19163;pid=19163
[1599236832] Error: Could not load module '/usr/local/nagios/bin/ndomod.o' -> /usr/local/nagios/bin/ndomod.o: cannot open shared object file: No such file or directory
[1599236832] Error: Failed to load module '/usr/local/nagios/bin/ndomod.o'.
[1599236832] Error: Module loading failed. Aborting.
what am I missing?

Re: can't get the nagios service to run

Posted: Fri Sep 04, 2020 1:02 pm
by benjaminsmith
Hi Ben,

So in 5.7.x , Nagios XI is now using a new backend database application called ndo3, and it looks like your system is still loading the older broker module.

Code: Select all

[1599236832] Error: Could not load module '/usr/local/nagios/bin/ndomod.o' -> /usr/local/nagios/bin/ndomod.o: cannot open shared object file: No such file or directory
[1599236832] Error: Failed to load module '/usr/local/nagios/bin/ndomod.o'.
[1599236832] Error: Module loading failed. Aborting.
Edit your /usr/local/nagios/etc/nagios.cfg and make sure this line is commented:

Code: Select all

#broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
And make sure this line is uncommented:

Code: Select all

broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
Then start the nagios service:

Code: Select all

systemctl start nagios
If you're still not able to get it started, please send us the profile. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

Re: can't get the nagios service to run

Posted: Fri Sep 04, 2020 2:38 pm
by benhank
That seems to have done the trick, the line that you said to uncomment wasn't there , I added it and now the nagios service seems to be running.


However I have this issue:

Code: Select all

[1599247086] WARNING: RLIMIT_NPROC is 63414, total max estimated processes is 160244! You should increase your limits (ulimit -u, or limits.conf)
[1599247087] NDO-3: Database initialized
[1599247089] Successfully launched command file worker with pid 1215
And nothing shows up in the monitoring events que:
Capture.PNG

Re: can't get the nagios service to run

Posted: Fri Sep 04, 2020 3:44 pm
by benhank
My profile is to big its 42 mb

Re: can't get the nagios service to run

Posted: Fri Sep 04, 2020 5:41 pm
by benjaminsmith
Hi,
That seems to have done the trick, the line that you said to uncomment wasn't there , I added it and now the nagios service seems to be running.
If everything is working, no need to send the profile. To increase it, edit the /etc/security/limits.conf file and add the following to the bottom of the file.

Code: Select all

*          soft     nproc          262144
*          hard     nproc          262144
Save the change and reboot the server for the change to take effect.

Reference:
https://www.thegeekdiary.com/how-to-set ... -rhel-567/

Re: can't get the nagios service to run

Posted: Tue Sep 08, 2020 8:19 am
by benhank
It's not working, I ran the commands and rebooted, but here is the result:

Code: Select all

[root@lkennagiost01 ~]# service nagios status
nagios (pid 2834) is running...
[root@lkennagiost01 ~]# killall  -9 nagios
[root@lkennagiost01 ~]# service nagios start
Starting nagios: done.
[root@lkennagiost01 ~]#
Capture.PNG

Re: can't get the nagios service to run

Posted: Tue Sep 08, 2020 11:33 am
by benhank
so the service is running but I dont think the checks are being executed properly:
Capture.PNG

Re: can't get the nagios service to run

Posted: Tue Sep 08, 2020 2:44 pm
by benhank
ok ive uploaded the profile

Re: can't get the nagios service to run

Posted: Wed Sep 09, 2020 12:25 pm
by benjaminsmith
Hi Ben,

The nagios service is running so checks are being processed but they are not being written to the database. Please take a backup or snapshot.

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Then run the following command to truncate the nagios database, and then re-start the software stack and let me know if the check results are coming.

Truncate tables:

Code: Select all

echo "truncate table nagios_objects; truncate table nagios_hosts; truncate table nagios_hoststatus; truncate table nagios_services; truncate table nagios_servicestatus;" | mysql -u root -pnagiosxi nagios
Restart Nagiso XI ( Cent 6)

Code: Select all

service crond stop
service npcd stop
service nagios stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service nagios start
service npcd start
service crond start

Re: can't get the nagios service to run

Posted: Wed Sep 09, 2020 1:44 pm
by benhank
I ran the commands and now its stuck on pending for about an hour:
Capture.PNG