Unit nagios.service entered failed state

idemia-cl · Post by **idemia-cl** » Sun Feb 06, 2022 11:01 pm

Hello,

I just migrated from nagios xi 5.6.14 (CentOS 6) to 5.8.7 (CentOS 7). At the moment Nagios XI has been down constantly, approximately every 15 minutes. The only thing I see in the log is the following:

Feb 7 00:39:20 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 20673.
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGSEGV, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service: main process exited, code=exited, status=254/n/a
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service failed.

The migration and update was done offline.
If you need more information let me know. Please your support

Pablo.

idemia-cl · Post by **idemia-cl** » Mon Feb 07, 2022 9:45 am

Hello, I have performed the following actions:

- Run the database repair script
/usr/local/nagiosxi/scripts/repair_databases.sh

- Restart nagios full stack
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd

Although nagios did not crash as often, but we still have the problem, with the following message:

[1644244397] wproc: iocache_capacity() is -1048576 for worker Core Worker 28830.
[1644244409] NDO-3: mysql_ping: Unknown error. Is the database running?
[1644244409] Caught SIGSEGV, shutting down...
[1644244409] Caught SIGTERM, shutting down...

ssax · Post by **ssax** » Mon Feb 07, 2022 11:18 am

Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

Please PM me this file as well:

Code: Select all

/usr/local/nagios/var/nagios.log

idemia-cl · Post by **idemia-cl** » Mon Feb 07, 2022 12:12 pm

Hello, I enclose the request

Post by **pbroste** » Tue Feb 08, 2022 1:40 pm

Hello @idemia-cl

Thanks for sending over the System Profile and nagios.logs.

Want to start off by verifying and correcting any errors in the Core config flight as we see an issue with ARAF host config:

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg

Then run through the following script to verify permissions:

Code: Select all

su - nagios
cd /usr/local/nagiosxi/scripts
./reconfigure_nagios.sh

As see that you are using Postgres and want to have you go through this:
https://support.nagios.com/kb/article/n ... r-754.html

Restart the nagios.service by (running: systemctl restart nagios ) and then verify the Core config:

Code: Select all

/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg

Please let us know how things look,
Perry

idemia-cl · Post by **idemia-cl** » Thu Feb 17, 2022 7:43 am

Hi, We executed the instruccions and I attach the archives of the command:
"/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg " (before and after) commands.
also We incrementing the parameter openFiles of the archive limits.conf because was for default.

but the message persists "NDO-3: mysql_ping: Unknown error. Is the database running"

Regards.

ssax · Post by **ssax** » Thu Feb 17, 2022 6:25 pm

Make sure this is done on the EL7 system:

https://support.nagios.com/kb/article-754.html

Edit this file:

Code: Select all

/usr/local/nagiosxi/html/config.inc.php

Above this (line 32):

Code: Select all

// db-specific connection information

Add this:

Code: Select all

// Database connection type
// 1 = persistent, 0 = normal
$cfg['db_conn_persistent'] = 1;

// db-specific connection information

I would also make sure your /etc/my.cnf has these under the [mysqld] section:

Code: Select all

max_connections=1000
max_allowed_packet=512M

Then restart the services and test again:

Code: Select all

systemctl restart mariadb httpd nagios crond

See if those changes resolve it.

idemia-cl · Post by **idemia-cl** » Fri Feb 25, 2022 11:04 am

hello, we have executed the instructions that you gave us, although the message "NDO-3: mysql_ping: Unknown error. Is the database running?" It no longer appears but we still maintain falls from the platform.

The following are the messages in /var/log/messages that we see when the platform goes down

Feb 25 09:51:29 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 09:51:29 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 10:29:55 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13984.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 10:59:51 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:03:28 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3634.
Feb 25 11:33:29 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 11:33:34 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:49:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6493.
Feb 25 12:19:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6490.
Feb 25 12:49:40 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 12:51:10 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 4279.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -795154 for worker Core Worker 4279.

please your support

Post by **kfanselow** » Tue Mar 01, 2022 5:36 pm

Hi Pablo,

Could you walk us through the steps of your migration ?

Thanks and Best Regards,
Keith

idemia-cl · Post by **idemia-cl** » Wed Mar 02, 2022 7:48 am

yes, previously we had an EL6 (CentOS 6) and a version of nagios 5.6.14. We proceeded to update due to vulnerabilities detected with that version, as well as problems with nagios crash.

We proceeded to install a new server with EL7 (CentOS 7) and install nagios 5.6.14. After this, migrate a backup to the new server, following the procedure (https://assets.nagios.com/downloads/nag ... ios-XI.pdf) .

Then the new server is updated to version 5.8.7. All this installation was done offline following the procedure (https://assets.nagios.com/downloads/nag ... onment.pdf)

Nagios Support Forum

Unit nagios.service entered failed state

Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state

Re: Unit nagios.service entered failed state