Unit nagios.service entered failed state
Unit nagios.service entered failed state
Hello,
I just migrated from nagios xi 5.6.14 (CentOS 6) to 5.8.7 (CentOS 7). At the moment Nagios XI has been down constantly, approximately every 15 minutes. The only thing I see in the log is the following:
Feb 7 00:39:20 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 20673.
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGSEGV, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service: main process exited, code=exited, status=254/n/a
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service failed.
The migration and update was done offline.
If you need more information let me know. Please your support
Pablo.
I just migrated from nagios xi 5.6.14 (CentOS 6) to 5.8.7 (CentOS 7). At the moment Nagios XI has been down constantly, approximately every 15 minutes. The only thing I see in the log is the following:
Feb 7 00:39:20 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 20673.
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGSEGV, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service: main process exited, code=exited, status=254/n/a
Feb 7 00:39:33 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 7 00:39:33 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 7 00:39:33 s1t2nagios02 systemd: nagios.service failed.
The migration and update was done offline.
If you need more information let me know. Please your support
Pablo.
Re: Unit nagios.service entered failed state
Hello, I have performed the following actions:
- Run the database repair script
/usr/local/nagiosxi/scripts/repair_databases.sh
- Restart nagios full stack
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
Although nagios did not crash as often, but we still have the problem, with the following message:
[1644244397] wproc: iocache_capacity() is -1048576 for worker Core Worker 28830.
[1644244409] NDO-3: mysql_ping: Unknown error. Is the database running?
[1644244409] Caught SIGSEGV, shutting down...
[1644244409] Caught SIGTERM, shutting down...
- Run the database repair script
/usr/local/nagiosxi/scripts/repair_databases.sh
- Restart nagios full stack
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
pkill -9 -u nagios
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagiosxi/var/event_handler.lock
rm -rf /var/run/nagios.lock
rm -rf /usr/local/nagios/var/nagios.lock
systemctl start nagios
systemctl start npcd
systemctl start crond
systemctl restart httpd
Although nagios did not crash as often, but we still have the problem, with the following message:
[1644244397] wproc: iocache_capacity() is -1048576 for worker Core Worker 28830.
[1644244409] NDO-3: mysql_ping: Unknown error. Is the database running?
[1644244409] Caught SIGSEGV, shutting down...
[1644244409] Caught SIGTERM, shutting down...
Re: Unit nagios.service entered failed state
Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.
Please PM me this file as well:
Please PM me this file as well:
Code: Select all
/usr/local/nagios/var/nagios.log
Re: Unit nagios.service entered failed state
Hello, I enclose the request
You do not have the required permissions to view the files attached to this post.
Re: Unit nagios.service entered failed state
Hello @idemia-cl
Thanks for sending over the System Profile and nagios.logs.
Want to start off by verifying and correcting any errors in the Core config flight as we see an issue with ARAF host config:
Then run through the following script to verify permissions:
As see that you are using Postgres and want to have you go through this:
https://support.nagios.com/kb/article/n ... r-754.html
Restart the nagios.service by (running: systemctl restart nagios ) and then verify the Core config:
Please let us know how things look,
Perry
Thanks for sending over the System Profile and nagios.logs.
Want to start off by verifying and correcting any errors in the Core config flight as we see an issue with ARAF host config:
Code: Select all
/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
Code: Select all
su - nagios
cd /usr/local/nagiosxi/scripts
./reconfigure_nagios.sh
https://support.nagios.com/kb/article/n ... r-754.html
Restart the nagios.service by (running: systemctl restart nagios ) and then verify the Core config:
Code: Select all
/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg
Perry
Re: Unit nagios.service entered failed state
Hi, We executed the instruccions and I attach the archives of the command:
"/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg " (before and after) commands.
also We incrementing the parameter openFiles of the archive limits.conf because was for default.
but the message persists "NDO-3: mysql_ping: Unknown error. Is the database running"
Regards.
"/usr/local/nagios/bin/nagios -vvv /usr/local/nagios/etc/nagios.cfg " (before and after) commands.
also We incrementing the parameter openFiles of the archive limits.conf because was for default.
but the message persists "NDO-3: mysql_ping: Unknown error. Is the database running"
Regards.
You do not have the required permissions to view the files attached to this post.
Re: Unit nagios.service entered failed state
Make sure this is done on the EL7 system:
https://support.nagios.com/kb/article-754.html
Edit this file:
Above this (line 32):
Add this:
I would also make sure your /etc/my.cnf has these under the [mysqld] section:
Then restart the services and test again:
See if those changes resolve it.
https://support.nagios.com/kb/article-754.html
Edit this file:
Code: Select all
/usr/local/nagiosxi/html/config.inc.php
Code: Select all
// db-specific connection information
Code: Select all
// Database connection type
// 1 = persistent, 0 = normal
$cfg['db_conn_persistent'] = 1;
// db-specific connection information
Code: Select all
max_connections=1000
max_allowed_packet=512M
Code: Select all
systemctl restart mariadb httpd nagios crond
Re: Unit nagios.service entered failed state
hello, we have executed the instructions that you gave us, although the message "NDO-3: mysql_ping: Unknown error. Is the database running?" It no longer appears but we still maintain falls from the platform.
The following are the messages in /var/log/messages that we see when the platform goes down
Feb 25 09:51:29 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 09:51:29 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 10:29:55 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13984.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 10:59:51 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:03:28 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3634.
Feb 25 11:33:29 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 11:33:34 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:49:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6493.
Feb 25 12:19:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6490.
Feb 25 12:49:40 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 12:51:10 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 4279.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -795154 for worker Core Worker 4279.
please your support
The following are the messages in /var/log/messages that we see when the platform goes down
Feb 25 09:51:29 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 09:51:29 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 10:29:55 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13984.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 13991.
Feb 25 10:59:51 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 10:59:51 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:03:28 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3634.
Feb 25 11:33:29 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: wproc: iocache_capacity() is -795155 for worker Core Worker 3638.
Feb 25 11:33:34 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 11:33:34 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 11:49:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6493.
Feb 25 12:19:15 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 6490.
Feb 25 12:49:40 s1t2nagios02 nagios: Caught SIGTERM, shutting down...
Feb 25 12:51:10 s1t2nagios02 systemd: Unit nagios.service entered failed state.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -1048576 for worker Core Worker 4279.
Feb 25 12:58:14 s1t2nagios02 nagios: wproc: iocache_capacity() is -795154 for worker Core Worker 4279.
please your support
Re: Unit nagios.service entered failed state
Hi Pablo,
Could you walk us through the steps of your migration ?
Thanks and Best Regards,
Keith
Could you walk us through the steps of your migration ?
Thanks and Best Regards,
Keith
Re: Unit nagios.service entered failed state
yes, previously we had an EL6 (CentOS 6) and a version of nagios 5.6.14. We proceeded to update due to vulnerabilities detected with that version, as well as problems with nagios crash.
We proceeded to install a new server with EL7 (CentOS 7) and install nagios 5.6.14. After this, migrate a backup to the new server, following the procedure (https://assets.nagios.com/downloads/nag ... ios-XI.pdf) .
Then the new server is updated to version 5.8.7. All this installation was done offline following the procedure (https://assets.nagios.com/downloads/nag ... onment.pdf)
We proceeded to install a new server with EL7 (CentOS 7) and install nagios 5.6.14. After this, migrate a backup to the new server, following the procedure (https://assets.nagios.com/downloads/nag ... ios-XI.pdf) .
Then the new server is updated to version 5.8.7. All this installation was done offline following the procedure (https://assets.nagios.com/downloads/nag ... onment.pdf)