Restarting Nagios crashes the database
Posted: Wed May 19, 2021 10:38 am
Hi Team
Every time the Nagios service is restarted the database crashes. Its been happening the last few days.
The reason I restarted Nagios is because I noticed the last check time was not updating in the UI, even though there were no errors in the logs
This is the crash error
210519 15:29:48 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
I ran a full repair and it still happens after
/usr/local/nagiosxi/scripts/repair_databases.sh
I also noticed Nagios does not stop correctly
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Wed 2021-05-19 13:59:17 GMT; 53s ago
Docs: https://www.nagios.org/documentation
Process: 22398 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 21918 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 10792 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 10505 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 10796 (code=killed, signal=KILL)
May 19 13:50:10 **********net nagios[10810]: job 45 (pid=16314): read() returned error 11
May 19 13:57:47 **********.net systemd[1]: Stopping Nagios Core 4.4.6...
May 19 13:57:47 **********.net nagios[10796]: Caught SIGTERM, shutting down...
May 19 13:57:47 **********.net nagios[10796]: Caught SIGTERM, shutting down...
May 19 13:57:47 **********.net nagios[10922]: Caught SIGTERM, shutting down...
May 19 13:59:17 **********.net systemd[1]: nagios.service stop-sigterm timed out. Killing.
May 19 13:59:17 **********.net systemd[1]: nagios.service: main process exited, code=killed, status=9/KILL
May 19 13:59:17 **********.net systemd[1]: Stopped Nagios Core 4.4.6.
May 19 13:59:17 **********.net systemd[1]: Unit nagios.service entered failed state.
May 19 13:59:17 **********.net systemd[1]: nagios.service failed.
Operating system
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Nagios version
Nagios XI 5.8.3
Thank you
Every time the Nagios service is restarted the database crashes. Its been happening the last few days.
The reason I restarted Nagios is because I noticed the last check time was not updating in the UI, even though there were no errors in the logs
This is the crash error
210519 15:29:48 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
I ran a full repair and it still happens after
/usr/local/nagiosxi/scripts/repair_databases.sh
I also noticed Nagios does not stop correctly
● nagios.service - Nagios Core 4.4.6
Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Wed 2021-05-19 13:59:17 GMT; 53s ago
Docs: https://www.nagios.org/documentation
Process: 22398 ExecStopPost=/usr/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
Process: 21918 ExecStop=/usr/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
Process: 10792 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Process: 10505 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
Main PID: 10796 (code=killed, signal=KILL)
May 19 13:50:10 **********net nagios[10810]: job 45 (pid=16314): read() returned error 11
May 19 13:57:47 **********.net systemd[1]: Stopping Nagios Core 4.4.6...
May 19 13:57:47 **********.net nagios[10796]: Caught SIGTERM, shutting down...
May 19 13:57:47 **********.net nagios[10796]: Caught SIGTERM, shutting down...
May 19 13:57:47 **********.net nagios[10922]: Caught SIGTERM, shutting down...
May 19 13:59:17 **********.net systemd[1]: nagios.service stop-sigterm timed out. Killing.
May 19 13:59:17 **********.net systemd[1]: nagios.service: main process exited, code=killed, status=9/KILL
May 19 13:59:17 **********.net systemd[1]: Stopped Nagios Core 4.4.6.
May 19 13:59:17 **********.net systemd[1]: Unit nagios.service entered failed state.
May 19 13:59:17 **********.net systemd[1]: nagios.service failed.
Operating system
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Nagios version
Nagios XI 5.8.3
Thank you