nagios_downtimehistory is marked as crashed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

nagios_downtimehistory is marked as crashed

Post by brucej543 »

Hi. Running NagiosXi 5.6.14. System is crashing with mysql failures trying to update the 'Table './nagios/nagios_downtimehistory'. We have a large number of servers (800+) that are in a reoccurring downtime. When the downtime stops (and starts) we are getting over 18,900+ errors in the /var/log/mariadb/mariadb.log log file and the /var/log/messages files. When it occurs, the systems total memory is used and will crash.
To temporary resolve the error is to rerun the database repair script. From past issues it seem the everytime a downtime starts or ends for a server their is a failure for the /nagios/nagios_downtimehistory.

As a side note, this issue occurred last month also and I was requested to add max_connections = 1000 and open_files_limit = 4096 to the /etc/my.cnf file. I had to remove these entries because when the system was restarted after it crash, it would run out of memory within 10 minutes causing a hard reboot.

Below are samples of the errors:

From the mariadb.log file:
200723 11:30:37 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed.
from the messages log file:
Jul 22 23:02:34 bcnagios01 ndo2db: mysql_error: 'Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed'
Jul 22 23:02:35 bcnagios01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_downtimehistory SET actual_start_time=FROM_UNIXTIME(1595473200), actual_start_time_usec='952661', was_started='1' WHERE instance_id='1' AND downtime_type='1' AND object_id='6263' AND entry_time=FROM_UNIXTIME(1594954867) AND scheduled_start_time=FROM_UNIXTIME(1595473200) AND scheduled_end_time=FROM_UNIXTIME(1595493000)'
Jul 22 23:02:35 bcnagios01 ndo2db: mysql_error: 'Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed'
Jul 22 23:02:35 bcnagios01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_downtimehistory SET actual_start_time=FROM_UNIXTIME(1595473200), actual_start_time_usec='961094', was_started='1' WHERE instance_id='1' AND downtime_type='2' AND object_id='10606' AND entry_time=FROM_UNIXTIME(1594954872) AND scheduled_start_time=FROM_UNIXTIME(1595473200) AND
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

I suspect that when the memory is fully used, the out of memory killer in the linux kernel kills off the MYSQL database as that is using the most memory and that stops the database to finish.

Run these commands to stop the processes, clean and repair the SQL database and to restart the processes. Run them all as root. Show all of the output.

Code: Select all

service npcd stop
service nagios stop
service ndo2db stop
service crond stop
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql stop; fi;
service mysqld restart
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill python
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond start
Hopefully that will free up enough memory to allow the repair to finish.
Let us know if this does repair the database.

When it is done, run the following as root and post the /tmp/info.txt file to the post so we can get some stats from the server.

Code: Select all

mysql -u root -pnagiosxi -e "show global status like '%used_connections%'; show variables like 'max_connections';" >/tmp/info.txt
echo "SELECT table_schema as 'Database', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC;" |mysql -t -u root -pnagiosxi >>/tmp/info.txt
top -b -n 1 >>/tmp/info.txt
df -h >>/tmp/info.txt
df -i >>/tmp/info.txt
ps aux >>/tmp/info.txt
Thanks
Be sure to check out our Knowledgebase for helpful articles and solutions!
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Attached is the requested information
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

Thanks for the files. All of the commands ran to completion and no errors were generated.
Is the system behaving better now and not generating the errors in the messages file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

we are still getting the below errors in the mariandb.log
200724 9:55:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200724 11:00:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200724 12:05:17 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200724 13:10:18 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200724 14:15:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

I just ran another database repair. we will see if the errors show up again. I did not see any errors in the message file since I ran the steps you provided. I will also monitor the messages file
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

Let us know what you find out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Ran another database repair due at 0700 this morning due to finding /nagios/nagios_logentries' is marked as crashed in the DB log file
Two hours later, we are now getting 200727 9:20:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed errors that are occurring every 65 minutes. See attach file.
You do not have the required permissions to view the files attached to this post.
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Just had 64 errors logged at 13:44 in the DB log all for nangios_downtimehistory Running db repair again.
Please help to resolve this issue. We can't keep just running this repair job multiple time a day
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

Let's increase the MYSQL Max Connections settings to see if that resolves the issue.
See this article for instructions.
https://support.nagios.com/kb/article/n ... s-513.html

If the Max Connections are hit, it will cause database corruptions and this may be what is causing the issue on your server.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked