Page 1 of 1

Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 11:25 am
by DFaught
Attached is the System Profile. We have tried rebooting the server and the database does not start.

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 1:28 pm
by npolovenko
@DFaught, I did not see DB logs in your profile. Depending on your installation and type of the DB, logs should be in one of these locations:
/var/log/mysqld.log
/var/log/mariadb/mariadb.log
/var/lib/pgsql/data/
Please check and if you see these files please upload them.

With that said, I already see a possible reason for the DB outage. Your /usr/local/nagios partition is 100% full.
/dev/mapper/datavg-usr_local_nagios 25G 25G 5.7M 100% /usr/local/nagios
This likely crashed the database. Please run the following command to see top 10 space consuming files.
find /usr/local/nagios -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Once you free up the space you need to run the repair mysql command:
mysqlcheck -r -f -uroot -pnagiosxi --all-databases

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 2:39 pm
by DFaught
Isn't that the mariadblog.txt file in the profile archive? Anyway, we added some space to the 2 filesystems that were full, ran the mysqlcheck successfully, and after that rebooted the server. It still seems to have an issue. I have attached another system profile from after these steps.

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 3:05 pm
by npolovenko
@DFaught, Looks like one table is still crashed. Please run the following command to repair the table:
echo 'repair table nagios_logentries use_frm;' | mysql -t -u root -pnagiosxi nagios
Then please run all the following commands in order. If you get any errors please upload them in the thread:
service nagios stop
service ndo2db stop
service mariadb stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mariadb start
service ndo2db start
service nagios start
service httpd start
service crond start

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 3:58 pm
by DFaught
Unfortunately, there were still problems. Here is the console log from this session:

[root@mlwnag21]:[/root]# echo 'repair table nagios_logentries use_frm;' | mysql -t -u root -pnagiosxi nagios
+--------------------------+--------+----------+-------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------------+--------+----------+-------------------------------------------------------------+
| nagios.nagios_logentries | repair | error | Can't create new tempfile: './nagios/nagios_logentries.TMD' |
| nagios.nagios_logentries | repair | status | Operation failed |
+--------------------------+--------+----------+-------------------------------------------------------------+

[root@mlwnag21]:[/root]# service nagios stop
service ndo2db stop
service mariadb stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
Stopping nagios (via systemctl): rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mariadb start
service ndo2db start
[ OK ]
[root@mlwnag21]:[/root]# service ndo2db stop
service nagios start
service httpd start
service crond start
Stopping ndo2db (via systemctl): [ OK ]
[root@mlwnag21]:[/root]# service mariadb stop
Redirecting to /bin/systemctl stop mariadb.service
[root@mlwnag21]:[/root]# service crond stop
Redirecting to /bin/systemctl stop crond.service
[root@mlwnag21]:[/root]# service httpd stop
Redirecting to /bin/systemctl stop httpd.service
[root@mlwnag21]:[/root]# killall -9 nagios
nagios: no process found
[root@mlwnag21]:[/root]# killall -9 ndo2db
ndo2db: no process found
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/rw/nagios.cmd
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/nagios.lock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/ndo.sock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/ndo2db.lock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
[root@mlwnag21]:[/root]# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
[root@mlwnag21]:[/root]# service mariadb start
Redirecting to /bin/systemctl start mariadb.service
[root@mlwnag21]:[/root]# service ndo2db start
Starting ndo2db (via systemctl): [ OK ]
[root@mlwnag21]:[/root]# service nagios start
Starting nagios (via systemctl): Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.
[FAILED]
[root@mlwnag21]:[/root]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@mlwnag21]:[/root]# service crond start
Redirecting to /bin/systemctl start crond.service

[root@mlwnag21]:[/root]# journalctl -xe
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Error: There are no contacts defined!
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 contacts.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 contact groups.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 commands.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 time periods.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 host escalations.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 service escalations.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking for circular paths...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 hosts
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 service dependencies
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 host dependencies
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 timeperiods
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking global event handlers...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking obsessive compulsive processor commands...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking misc settings...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Warning: Nothing specified for illegal_macro_output_chars variable!
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Total Warnings: 1
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Total Errors: 3
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: ***> One or more problems was encountered while running the pre-flight check...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Check your configuration file(s) to ensure that they contain valid
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: directives and data defintions. If you are upgrading from a previous
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: version of Nagios, you should be aware that some variables/definitions
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: may have been removed or modified in this version. Make sure to read
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: the HTML documentation regarding the config files, as well as the
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: 'Whats New' section to find out what has changed.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: nagios.service: control process exited, code=exited status=8
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: Failed to start LSB: Starts and stops the Nagios monitoring server.
-- Subject: Unit nagios.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/li ... temd-devel
--
-- Unit nagios.service has failed.
--
-- The result is failed.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: Unit nagios.service entered failed state.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net polkitd[1033]: Unregistered Authentication Agent for unix-process:48510:512083 (system bus name :1.1856,
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: nagios.service failed.

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 4:17 pm
by npolovenko
@DFaught, Let's try the following commands instead:
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f *
cd /var/lib/mysql/nagiosql
myisamchk -r -f *
cd /var/lib/mysql/nagiosxi
myisamchk -r -f *
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/bin/php /usr/local/nagiosxi/cron/dbmaint.php

Re: Problem with Nagios database not starting

Posted: Wed Oct 31, 2018 4:57 pm
by DFaught
Switched mysqld with mariadb. The myisamchk managed to fill up /var/tmp so more space was added. Various log files attached.

Re: Problem with Nagios database not starting

Posted: Thu Nov 01, 2018 9:58 am
by npolovenko
@DFaught, Looks like the database started. Please reboot the server with:
shutdown -r now
And let me know what issues in XI you're noticing.
Plus upload a new system profile.

Re: Problem with Nagios database not starting

Posted: Thu Nov 01, 2018 10:45 am
by npolovenko
I see you have opened a ticket for the same issue in our ticketing system. I'm going to close this thread so that we can focus our efforts. Thank you!