Page 1 of 2

Unit nagios.service entered failed state

Posted: Sun Feb 06, 2022 11:01 pm
by idemia-cl
Hello,

I just migrated from nagios xi 5.6.14 (CentOS 6) to 5.8.7 (CentOS 7). At the moment Nagios XI has been down constantly, approximately every 15 minutes. The only thing I see in the log is the following:

Feb 7 00:39:20 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 20673.
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGSEGV, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service: main process exited, code=exited, status=254/n/a
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service failed.


The migration and update was done offline.
If you need more information let me know. Please your support

Pablo.

Re: Unit nagios.service entered failed state

Posted: Mon Feb 07, 2022 9:45 am
by idemia-cl
Hello, I have performed the following actions:

- Run the database repair script
/usr/local/nagiosxi/scripts/repair_databases.sh

- Restart nagios full stack
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd


Although nagios did not crash as often, but we still have the problem, with the following message:

[1644244397] wproc: iocache_capacity() is -1048576 for worker Core Worker 28830.
[1644244409] NDO-3: mysql_ping: Unknown error. Is the database running?
[1644244409] Caught SIGSEGV, shutting down...
[1644244409] Caught SIGTERM, shutting down...

Re: Unit nagios.service entered failed state

Posted: Mon Feb 07, 2022 11:18 am
by ssax
Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

Please PM me this file as well:

Code: Select all

/usr/local/nagios/var/nagios.log

Re: Unit nagios.service entered failed state

Posted: Mon Feb 07, 2022 12:12 pm
by idemia-cl
Hello, I enclose the request

Re: Unit nagios.service entered failed state

Posted: Tue Feb 08, 2022 1:40 pm
by pbroste
Hello @idemia-cl

Thanks for sending over the System Profile and nagios.logs.

Want to start off by verifying and correcting any errors in the Core config flight as we see an issue with ARAF host config:

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
Then run through the following script to verify permissions:

Code: Select all

su - nagios
cd /usr/local/nagiosxi/scripts
./reconfigure_nagios.sh
As see that you are using Postgres and want to have you go through this:
https://support.nagios.com/kb/article/n ... r-754.html

Restart the nagios.service by (running: systemctl restart nagios ) and then verify the Core config:

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
Please let us know how things look,
Perry

Re: Unit nagios.service entered failed state

Posted: Thu Feb 17, 2022 7:43 am
by idemia-cl
Hi, We executed the instruccions and I attach the archives of the command:
"/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg " (before and after) commands.
also We incrementing the parameter openFiles of the archive limits.conf because was for default.

but the message persists "NDO-3: mysql_ping: Unknown error. Is the database running"

Regards.

Re: Unit nagios.service entered failed state

Posted: Thu Feb 17, 2022 6:25 pm
by ssax
Make sure this is done on the EL7 system:

https://support.nagios.com/kb/article-754.html

Edit this file:

Code: Select all

/usr/local/nagiosxi/html/config.inc.php
Above this (line 32):

Code: Select all

// db-specific connection information
Add this:

Code: Select all

// Database connection type
// 1 = persistent, 0 = normal
$cfg['db_conn_persistent'] = 1;

// db-specific connection information
I would also make sure your /etc/my.cnf has these under the [mysqld] section:

Code: Select all

max_connections=1000
max_allowed_packet=512M
Then restart the services and test again:

Code: Select all

systemctl restart mariadb httpd nagios crond
See if those changes resolve it.

Re: Unit nagios.service entered failed state

Posted: Fri Feb 25, 2022 11:04 am
by idemia-cl
hello, we have executed the instructions that you gave us, although the message "NDO-3: mysql_ping: Unknown error. Is the database running?" It no longer appears but we still maintain falls from the platform.

The following are the messages in /var/log/messages that we see when the platform goes down

Feb 25 09:51:29 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 09:51:29 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 10:29:55 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13984.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 10:59:51 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:03:28 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3634.
Feb 25 11:33:29 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 11:33:34 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:49:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6493.
Feb 25 12:19:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6490.
Feb 25 12:49:40 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 12:51:10 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 4279.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -795154 for worker Core Worker 4279.

please your support

Re: Unit nagios.service entered failed state

Posted: Tue Mar 01, 2022 5:36 pm
by kfanselow
Hi Pablo,

Could you walk us through the steps of your migration ?

Thanks and Best Regards,
Keith

Re: Unit nagios.service entered failed state

Posted: Wed Mar 02, 2022 7:48 am
by idemia-cl
yes, previously we had an EL6 (CentOS 6) and a version of nagios 5.6.14. We proceeded to update due to vulnerabilities detected with that version, as well as problems with nagios crash.

We proceeded to install a new server with EL7 (CentOS 7) and install nagios 5.6.14. After this, migrate a backup to the new server, following the procedure (https://assets.nagios.com/downloads/nag ... ios-XI.pdf) .

Then the new server is updated to version 5.8.7. All this installation was done offline following the procedure (https://assets.nagios.com/downloads/nag ... onment.pdf)