Problem with Nagios database not starting

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Problem with Nagios database not starting

Post by DFaught »

Attached is the System Profile. We have tried rebooting the server and the database does not start.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Problem with Nagios database not starting

Post by npolovenko »

@DFaught, I did not see DB logs in your profile. Depending on your installation and type of the DB, logs should be in one of these locations:
/var/log/mysqld.log
/var/log/mariadb/mariadb.log
/var/lib/pgsql/data/
Please check and if you see these files please upload them.

With that said, I already see a possible reason for the DB outage. Your /usr/local/nagios partition is 100% full.
/dev/mapper/datavg-usr_local_nagios 25G 25G 5.7M 100% /usr/local/nagios
This likely crashed the database. Please run the following command to see top 10 space consuming files.
find /usr/local/nagios -type f -print0 | xargs -0 du | sort -n | tail -10 | cut -f2 | xargs -I{} du -sh {}
Once you free up the space you need to run the repair mysql command:
mysqlcheck -r -f -uroot -pnagiosxi --all-databases
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: Problem with Nagios database not starting

Post by DFaught »

Isn't that the mariadblog.txt file in the profile archive? Anyway, we added some space to the 2 filesystems that were full, ran the mysqlcheck successfully, and after that rebooted the server. It still seems to have an issue. I have attached another system profile from after these steps.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Problem with Nagios database not starting

Post by npolovenko »

@DFaught, Looks like one table is still crashed. Please run the following command to repair the table:
echo 'repair table nagios_logentries use_frm;' | mysql -t -u root -pnagiosxi nagios
Then please run all the following commands in order. If you get any errors please upload them in the thread:
service nagios stop
service ndo2db stop
service mariadb stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mariadb start
service ndo2db start
service nagios start
service httpd start
service crond start
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: Problem with Nagios database not starting

Post by DFaught »

Unfortunately, there were still problems. Here is the console log from this session:

[root@mlwnag21]:[/root]# echo 'repair table nagios_logentries use_frm;' | mysql -t -u root -pnagiosxi nagios
+--------------------------+--------+----------+-------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------------+--------+----------+-------------------------------------------------------------+
| nagios.nagios_logentries | repair | error | Can't create new tempfile: './nagios/nagios_logentries.TMD' |
| nagios.nagios_logentries | repair | status | Operation failed |
+--------------------------+--------+----------+-------------------------------------------------------------+

[root@mlwnag21]:[/root]# service nagios stop
service ndo2db stop
service mariadb stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
Stopping nagios (via systemctl): rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mariadb start
service ndo2db start
[ OK ]
[root@mlwnag21]:[/root]# service ndo2db stop
service nagios start
service httpd start
service crond start
Stopping ndo2db (via systemctl): [ OK ]
[root@mlwnag21]:[/root]# service mariadb stop
Redirecting to /bin/systemctl stop mariadb.service
[root@mlwnag21]:[/root]# service crond stop
Redirecting to /bin/systemctl stop crond.service
[root@mlwnag21]:[/root]# service httpd stop
Redirecting to /bin/systemctl stop httpd.service
[root@mlwnag21]:[/root]# killall -9 nagios
nagios: no process found
[root@mlwnag21]:[/root]# killall -9 ndo2db
ndo2db: no process found
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/rw/nagios.cmd
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/nagios.lock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/ndo.sock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagios/var/ndo2db.lock
[root@mlwnag21]:[/root]# rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
[root@mlwnag21]:[/root]# for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
[root@mlwnag21]:[/root]# service mariadb start
Redirecting to /bin/systemctl start mariadb.service
[root@mlwnag21]:[/root]# service ndo2db start
Starting ndo2db (via systemctl): [ OK ]
[root@mlwnag21]:[/root]# service nagios start
Starting nagios (via systemctl): Job for nagios.service failed because the control process exited with error code. See "systemctl status nagios.service" and "journalctl -xe" for details.
[FAILED]
[root@mlwnag21]:[/root]# service httpd start
Redirecting to /bin/systemctl start httpd.service
[root@mlwnag21]:[/root]# service crond start
Redirecting to /bin/systemctl start crond.service

[root@mlwnag21]:[/root]# journalctl -xe
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Error: There are no contacts defined!
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 contacts.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 contact groups.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 commands.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 time periods.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 host escalations.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 service escalations.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking for circular paths...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 hosts
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 service dependencies
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 host dependencies
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checked 0 timeperiods
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking global event handlers...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking obsessive compulsive processor commands...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Checking misc settings...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Warning: Nothing specified for illegal_macro_output_chars variable!
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Total Warnings: 1
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Total Errors: 3
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: ***> One or more problems was encountered while running the pre-flight check...
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: Check your configuration file(s) to ensure that they contain valid
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: directives and data defintions. If you are upgrading from a previous
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: version of Nagios, you should be aware that some variables/definitions
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: may have been removed or modified in this version. Make sure to read
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: the HTML documentation regarding the config files, as well as the
Oct 31 16:20:46 mlwnag21.corp.footlocker.net nagios[48516]: 'Whats New' section to find out what has changed.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: nagios.service: control process exited, code=exited status=8
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: Failed to start LSB: Starts and stops the Nagios monitoring server.
-- Subject: Unit nagios.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/li ... temd-devel
--
-- Unit nagios.service has failed.
--
-- The result is failed.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: Unit nagios.service entered failed state.
Oct 31 16:20:46 mlwnag21.corp.footlocker.net polkitd[1033]: Unregistered Authentication Agent for unix-process:48510:512083 (system bus name :1.1856,
Oct 31 16:20:46 mlwnag21.corp.footlocker.net systemd[1]: nagios.service failed.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Problem with Nagios database not starting

Post by npolovenko »

@DFaught, Let's try the following commands instead:
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f *
cd /var/lib/mysql/nagiosql
myisamchk -r -f *
cd /var/lib/mysql/nagiosxi
myisamchk -r -f *
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/bin/php /usr/local/nagiosxi/cron/dbmaint.php
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: Problem with Nagios database not starting

Post by DFaught »

Switched mysqld with mariadb. The myisamchk managed to fill up /var/tmp so more space was added. Various log files attached.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Problem with Nagios database not starting

Post by npolovenko »

@DFaught, Looks like the database started. Please reboot the server with:
shutdown -r now
And let me know what issues in XI you're noticing.
Plus upload a new system profile.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Problem with Nagios database not starting

Post by npolovenko »

I see you have opened a ticket for the same issue in our ticketing system. I'm going to close this thread so that we can focus our efforts. Thank you!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked