nagios_downtimehistory is marked as crashed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

more files
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

Before 11pm, is someone logging in to the server and stopping MariaDB?

If so, they need to stop the nagios processes and cron before hand as they are trying to read and write data the MYSQL database and it is it shutdown without stopping the other processes first, that will corrupt the database.

On the 28th in the evening, that is what I saw.

If you do need to stop / restart the processes, do it in this order.

Code: Select all

systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop crond
pkill -9 -u nagios

systemctl restart mariadb

systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond

The other times the database became corrupt, nothing was conclusive in the log files.
Be sure to check out our Knowledgebase for helpful articles and solutions!
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

There is no known process to shutdown nagios at any time and right now I am the only person with the knowledge to make that happen. I was online last night and watching the system when it hung up again. It actually took two hard reboots to get the system back up. It started to lockup around 10:45 PM and was locked up by 11:00 PM.

I am going to check when our enterprise backup starts doing the backup of this system.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: nagios_downtimehistory is marked as crashed

Post by tgriep »

I was talking about MariaDB. That was shutdown , not the nagios process.

If the issue happens again and before restarting the processes or reboot, can you run the following commands as root and post the output here?

Code: Select all

top -b -n 1
ps -ef --cols=300
Also, run the following commands. Get the /tmp/info.txt file and upload it to the ticket.

Code: Select all

mysql -u root -pnagiosxi -e "show global status like '%used_connections%'; show variables like 'max_connections';" >/tmp/info.txt
echo "SELECT table_schema as 'Database', table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES ORDER BY (data_length + index_length) DESC;" |mysql -t -u root -pnagiosxi >>/tmp/info.txt
Be sure to check out our Knowledgebase for helpful articles and solutions!
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Please note that the system did not hang last night so there is was no issues. Ran a expected.
Yesterday, a changed the log_warnings level for the DB from a 1 to a 4. I now see additional information regarding the downtimehistory issue.
below is the downtime error showing the failure to created a new tempfile
200731 7:00:02 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 7:00:02 [ERROR] mysqld: Table 'nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 7:00:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 7:00:04 [ERROR] mysqld: Table 'nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 7:00:04 [ERROR] nagios.nagios_logentries: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_logentries.TMM'
200731 8:05:18 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 8:05:18 [ERROR] mysqld: Table 'nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 8:05:18 [ERROR] nagios.nagios_logentries: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_logentries.TMM'
200731 9:10:04 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 9:10:04 [ERROR] mysqld: Table 'nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 9:10:04 [ERROR] nagios.nagios_logentries: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_logentries.TMM'
200731 10:15:05 [ERROR] mysqld: Table './nagios/nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 10:15:05 [ERROR] mysqld: Table 'nagios_downtimehistory' is marked as crashed and last (automatic?) repair failed
200731 10:15:05 [ERROR] nagios.nagios_logentries: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_logentries.TMM'

Here is a listing of the /var/lib/mysql/nagios directory of any TMM file. The nagios_logentries.TMM is very large ( 1 GB )
[root@bcnagios01 nagios]# pwd
/var/lib/mysql/nagios
[root@bcnagios01 nagios]# ls -ltr | grep TMM
-rw-rw---- 1 mysql mysql 1071376384 Dec 19 2019 nagios_logentries.TMM
-rw-rw---- 1 mysql mysql 3108864 Jun 25 08:28 nagios_downtimehistory.TMM

Does this file need to be cleared.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios_downtimehistory is marked as crashed

Post by ssax »

Yes, clear the nagios_logentries table, the only data it contains is for Reports > Event Log.

Still send the output of this command:

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
See here for how to clean it up:

FAQ: Can I truncate the tables first before proceeding with database repair (if I have crashed tables)?​

You can truncate before repairing the DB, it's up to you. If you want to back it up first, you'll need to repair it. If you don't care, or already have a backup, truncate it first as it will speed up the DB repair process.

NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the commands if your DB is housed/stored/offloaded/contained on a different server and/or you've changed the root mysql password​

If you don't care about the data, or already have a backup, you can just truncate the tables which will essentially drop and recreate the table with zero data in it (removing all historical data for the respective reports):

Code: Select all

nagios_logentries - Impacts Event Log report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_logentries;'

nagios_statehistory - Impacts the State History report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_statehistory;'

nagios_notifications - Impacts the Notifications report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'TRUNCATE TABLE nagios_notifications;'


These should technically work to clean the DB tables up manually (if the tables aren't crashed, if they ARE crashed, you will need to repair the database FIRST in order to run these queries):

nagios_logentries - Impacts Event Log report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_logentries WHERE logentry_time <= (NOW() - INTERVAL 6 MONTH);'

nagios_statehistory - Impacts the State History report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_statehistory WHERE state_time <= (NOW() - INTERVAL 6 MONTH);'

nagios_notifications - Impacts the Notifications report length

mysql -uroot -pnagiosxi -h 127.0.0.1 -B nagios -e 'DELETE FROM nagios_notifications WHERE start_time <= (NOW() - INTERVAL 6 MONTH);'
Then you should go to Admin > Performance Settings > Databases tab and adjust ALL of the retention intervals to meet your business data policy standards to keep them cleaned up as these settings are for adjusting the retention on those DB tables.

I would lower them to the smallest possible level and utilize the XI backup/restore process and the Admin > Scheduled Backups process to offload the backups to another server. Since these XI backups contain database backups you can spin them up to grab the data and report on them if needed.

See here for more information:

https://assets.nagios.com/downloads/nag ... os-XI.pdf​

And here:​

https://assets.nagios.com/downloads/nag ... abase.pdf​
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Attach is the DB size listing. Performed all the suggested Truncate of DB files and nulled out the /var/lib/mysql/nagios/nagios_logentries.TMM
All lowered settings in the DB retention.
In looking at the DB size listing, why is the nagios_downtimehistory size listed as NULL.
You do not have the required permissions to view the files attached to this post.
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Errors in DB Log after truncate tables and running the repair script
Fri Jul 31 23:08:49 EDT 2020
200731 23:00:09 [ERROR] nagios.nagios_downtimehistory: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_downtimehistory.TMM'
200731 23:00:09 [ERROR] nagios.nagios_logentries: Can't create new tempfile: '/var/lib/mysql/nagios/nagios_logentries.TMM'
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: nagios_downtimehistory is marked as crashed

Post by ssax »

Still crashed:

Code: Select all

| nagios_downtimehistory                     |       NULL |
Run these commands:

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh
Are you running out of free space?

Code: Select all

df -h
df -i
Please send me a copy of your profile, you can download it from Admin > System Profile > Download Profile button.
brucej543
Posts: 134
Joined: Thu Jun 21, 2018 9:33 am

Re: nagios_downtimehistory is marked as crashed

Post by brucej543 »

Ran repair script:
=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded
Space listings - Both look OK
[root@bcnagios01 scripts]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 708M 15G 5% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/rootvg-root 3.7G 201M 3.3G 6% /
/dev/mapper/rootvg-usr 4.5G 3.1G 1.2G 73% /usr
/dev/sda1 453M 193M 233M 46% /boot
/dev/mapper/rootvg-sysadmin 2.1G 18M 2.0G 1% /sysadmin
/dev/mapper/rootvg-opt 3.3G 778M 2.4G 25% /opt
/dev/mapper/rootvg-rootvg--backup 86G 26G 57G 31% /store
/dev/mapper/rootvg-tmp 1.9G 8.8M 1.7G 1% /tmp
/dev/mapper/rootvg-usr_local 40G 17G 22G 43% /usr/local
/dev/mapper/rootvg-var 76G 9.5G 64G 13% /var
tmpfs 3.2G 0 3.2G 0% /run/user/0
tmpfs 3.2G 0 3.2G 0% /run/user/1003
tmpfs 3.2G 0 3.2G 0% /run/user/1744104356
[root@bcnagios01 scripts]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 4094336 494 4093842 1% /dev
tmpfs 4097220 1 4097219 1% /dev/shm
tmpfs 4097220 1432 4095788 1% /run
tmpfs 4097220 16 4097204 1% /sys/fs/cgroup
/dev/mapper/rootvg-root 244320 7061 237259 3% /
/dev/mapper/rootvg-usr 305216 129975 175241 43% /usr
/dev/sda1 121920 350 121570 1% /boot
/dev/mapper/rootvg-sysadmin 548864 454 548410 1% /sysadmin
/dev/mapper/rootvg-opt 219888 239 219649 1% /opt
/dev/mapper/rootvg-rootvg--backup 5701632 218 5701414 1% /store
/dev/mapper/rootvg-tmp 122160 599 121561 1% /tmp
/dev/mapper/rootvg-usr_local 2570240 47805 2522435 2% /usr/local
/dev/mapper/rootvg-var 4910160 145500 4764660 3% /var
tmpfs 4097220 1 4097219 1% /run/user/0
tmpfs 4097220 1 4097219 1% /run/user/1003
tmpfs 4097220 1 4097219 1% /run/user/1744104356

Ran the database table listing and now shows a size for the
| nagios_downtimehistory | 15.40 |

I will monitor the mariadb.log file for any ERROR's
Locked