Page 1 of 4
table nagios_notifications crashed (in every hour)
Posted: Fri Nov 08, 2019 3:46 am
by eycklin
Hi,
In my Nagios XI 5.6.6 environment, DB table nagios_notifications crashed every hour( about every 30 ~ 60 minutes)
even I repaird it.(following document Repairing The Nagios XI Database.pdf).
Any way to fix it?
and, after issue following command:
echo 'repair table nagios_systemcommands use_frm;' | mysql -t -u root -pnagiosxi nagios
I see some words: "Number of rows changed from 0 to 9044(some number)".
is it normal?
Installation environment:
OS: Cent OS 7, 64 bit.
Physical Host.
Install with Cent OS 7 mininal iso, then run fullinstall script in Nagios XI 5.6.6.
Thanks,
Eyck Lin
Re: table nagios_notifications crashed (in every hour)
Posted: Fri Nov 08, 2019 11:33 am
by benjaminsmith
Hello Eyck,
How long this been happening and did you make any recent changes to the server. I'd like to review the logs in the system profile to troubleshoot the error.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share this in a private message and then
reply to this post to bring it up in the queue.
Additionally, please post the output to the following query to check the size of the database tables. Thanks.
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Re: table nagios_notifications crashed (in every hour)
Posted: Mon Nov 11, 2019 1:01 am
by eycklin
Hi,
Please see the attachment.
Thanks,
Eyck Lin
Support edit: data_collect_20181111.zip downloaded and shared with team
Re: table nagios_notifications crashed (in every hour)
Posted: Mon Nov 11, 2019 1:58 pm
by tgriep
Run this procedure as root to stop all of the process, clear out the temporary data, repair the database and restart the processes.
Code: Select all
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
systemctl stop crond
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then systemctl stop postgresql; fi;
systemctl restart mariadb
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /var/run/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /var/lib/mrtg/mrtg_l
rm -f /usr/local/nagiosxi/var/*.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill python
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
These tables only hold temporary data and should not be as large as they are so do the above procedure.
Code: Select all
| xi_events | 224.80 |
| xi_meta | 4859.86 |
Re: table nagios_notifications crashed (in every hour)
Posted: Wed Nov 13, 2019 12:49 am
by eycklin
Hi,
We do the procedures at 17:30 on 11/12.
and it's crashed on about 18:10.
The procedure log is in log20191112.log.
The error log is in log20191113.zip.
and we also get profile log in 20191113.zip
Please give us some advice.
Thanks,
Eyck Lin
Re: table nagios_notifications crashed (in every hour)
Posted: Wed Nov 13, 2019 11:55 am
by tgriep
One of your Notification messages is very large and when it tries to get logged in the MYSQL table, it gets corrupted.
If you run this command it will increase the size of the output field in the nagios_notifications table so the large output will fit.
Code: Select all
echo "alter table nagios_notifications modify output text NOT NULL;" | mysql -uroot -pnagiosxi nagios
Doing the above may cause issues with displaying the notification data in the reports and doing the change is at your own risk.
BTW, this is the details on the Host and Service with the large output.
Nov 12 18:09:23 localhost nagios: SERVICE NOTIFICATION: jenny_lin;TT1-A-Core-HP7506;Hardware Health;CRITICAL;xi_service_notification_handler;
Re: table nagios_notifications crashed (in every hour)
Posted: Mon Nov 18, 2019 4:49 am
by eycklin
Hi,
We disable some long-text service checks.
But the DB crash is still happened.
Log files is attached.
Please give some advice.
Thanks,
Eyck Lin
Re: table nagios_notifications crashed (in every hour)
Posted: Mon Nov 18, 2019 10:42 am
by tgriep
Lets drop and recreate the nagios_notifications table.
Create a file called notifications.sql and put in the following entries.
Code: Select all
DROP TABLE IF EXISTS `nagios_notifications`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `nagios_notifications` (
`notification_id` int(11) NOT NULL AUTO_INCREMENT,
`instance_id` smallint(6) NOT NULL DEFAULT '0',
`notification_type` smallint(6) NOT NULL DEFAULT '0',
`notification_reason` smallint(6) NOT NULL DEFAULT '0',
`object_id` int(11) NOT NULL DEFAULT '0',
`start_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`start_time_usec` int(11) NOT NULL DEFAULT '0',
`end_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`end_time_usec` int(11) NOT NULL DEFAULT '0',
`state` smallint(6) NOT NULL DEFAULT '0',
`output` varchar(255) CHARACTER SET latin1 NOT NULL DEFAULT '',
`long_output` text NOT NULL,
`escalated` smallint(6) NOT NULL DEFAULT '0',
`contacts_notified` smallint(6) NOT NULL DEFAULT '0',
PRIMARY KEY (`notification_id`),
UNIQUE KEY `instance_id` (`instance_id`,`object_id`,`start_time`,`start_time_usec`),
KEY `start_time` (`start_time`),
KEY `object_id` (`object_id`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COMMENT='Historical record of host and service notifications';
Save the file.
Stop the processes from accessing the database bu running
Code: Select all
systemctl stop nagios
systemctl stop ndo2db
Run the following as root to drop and recreate the table.
Code: Select all
mysql -u root -pnagiosxi --database nagios < notifications.sql
Check the table to see if it is recreates by running the following
Code: Select all
echo 'desc nagios_notifications;' |mysql -t -u root -pnagiosxi nagios
It should look like this,
Code: Select all
+---------------------+--------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------------------+----------------+
| notification_id | int(11) | NO | PRI | NULL | auto_increment |
| instance_id | smallint(6) | NO | MUL | 0 | |
| notification_type | smallint(6) | NO | | 0 | |
| notification_reason | smallint(6) | NO | | 0 | |
| object_id | int(11) | NO | MUL | 0 | |
| start_time | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| start_time_usec | int(11) | NO | | 0 | |
| end_time | datetime | NO | | 0000-00-00 00:00:00 | |
| end_time_usec | int(11) | NO | | 0 | |
| state | smallint(6) | NO | | 0 | |
| output | varchar(255) | NO | | | |
| long_output | text | NO | | NULL | |
| escalated | smallint(6) | NO | | 0 | |
| contacts_notified | smallint(6) | NO | | 0 | |
+---------------------+--------------+------+-----+---------------------+----------------+
If so, restart the processes
Code: Select all
systemctl start ndo2db
systemctl start nagios
See if the issue is gone.
If not, run the following as root so I can get the version of MYSQL and the number of connections.
Code: Select all
mysql -V
mysql -u root -pnagiosxi -e "show global status like '%used_connections%'; show variables like 'max_connections';"
Then get the following files from the server and post them.
and all of the files in the following folder if any exist.
Re: table nagios_notifications crashed (in every hour)
Posted: Tue Nov 19, 2019 8:21 pm
by eycklin
Hi,
After drop and recreate table nagios_notifications, it's still the same.
DB still crash.
Log file as attached.
Please help.
Thanks,
Eyck Lin
Re: table nagios_notifications crashed (in every hour)
Posted: Wed Nov 20, 2019 10:31 am
by tgriep
Let's increase the size of the field in the table by running this.
Code: Select all
echo "alter table nagios_notifications modify output text NOT NULL;" | mysql -uroot -pnagiosxi nagios
Then, run a full repair of the database and let us know if this fixes the corruption.
Code: Select all
mysqlcheck -f -r -u root -pnagiosxi --all-databases --use-frm