Error in nagios after kernel panic
Posted: Tue Jul 28, 2015 5:11 am
Hi people.
I am working about a problem when the system crash out with a kernel panic and is needed to reboot.
After the reboot I observe a general dysfunction of Nagios dashboard: not refresh, not working the check buttons, not to send emails.
This is solved for a minutes when a simple restart of nagios but in a short time this problem is reproduced.
I can see in /var/log/messages the :
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 6400 of 15737 messages and 6553600 of 6553600 bytes in the queue. See README for kernel tuning options.
I check the problem is solved for a minutes with only restart ndo2db, but in a short time the problem returns:
ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README
Never before have we needed to modify the Kernel queue parameters but, we do that.
echo 65536000 > /proc/sys/kernel/msgmnb
echo 65536000 > /proc/sys/kernel/msgmax
service ndo2db restart
So, In this case the problem is solved, and all its OK for a little more time, but in 3 hours the malfunction come back.
So, I check in this case the database, and I see that from the reeboot we have a ERROR message in nagios database. I read /var/log/mysql.log:
150727 04:02:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150727 4:02:28 InnoDB: Initializing buffer pool, size = 8.0M
150727 4:02:28 InnoDB: Completed initialization of buffer pool
150727 4:02:28 InnoDB: Started; log sequence number 0 44243
150727 4:02:29 [Note] Event Scheduler: Loaded 0 events
150727 4:02:29 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: XXXX Source distribution
150727 4:10:36 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:22:42 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:24:33 [Note] /usr/libexec/mysqld: Normal shutdown
I consult the official documentation of MYSQL and i find this simple solution to fix the error:
mysql> use nagios;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> repair table nagios_servicestatus;
+-----------------------------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+--------+----------+----------+
| nagios.nagios_servicestatus | repair | status | OK |
+-----------------------------+--------+----------+----------+
1 row in set (0.00 sec)
And now i reset to kernel queue parameters to baseline and restart ndo2db.
For the time being all is OK. I suppose that the system lost somes values and that produced the fail.
Any idea?
Thanks.
I am working about a problem when the system crash out with a kernel panic and is needed to reboot.
After the reboot I observe a general dysfunction of Nagios dashboard: not refresh, not working the check buttons, not to send emails.
This is solved for a minutes when a simple restart of nagios but in a short time this problem is reproduced.
I can see in /var/log/messages the :
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 6400 of 15737 messages and 6553600 of 6553600 bytes in the queue. See README for kernel tuning options.
I check the problem is solved for a minutes with only restart ndo2db, but in a short time the problem returns:
ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README
Never before have we needed to modify the Kernel queue parameters but, we do that.
echo 65536000 > /proc/sys/kernel/msgmnb
echo 65536000 > /proc/sys/kernel/msgmax
service ndo2db restart
So, In this case the problem is solved, and all its OK for a little more time, but in 3 hours the malfunction come back.
So, I check in this case the database, and I see that from the reeboot we have a ERROR message in nagios database. I read /var/log/mysql.log:
150727 04:02:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150727 4:02:28 InnoDB: Initializing buffer pool, size = 8.0M
150727 4:02:28 InnoDB: Completed initialization of buffer pool
150727 4:02:28 InnoDB: Started; log sequence number 0 44243
150727 4:02:29 [Note] Event Scheduler: Loaded 0 events
150727 4:02:29 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: XXXX Source distribution
150727 4:10:36 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:22:42 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:24:33 [Note] /usr/libexec/mysqld: Normal shutdown
I consult the official documentation of MYSQL and i find this simple solution to fix the error:
mysql> use nagios;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> repair table nagios_servicestatus;
+-----------------------------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+--------+----------+----------+
| nagios.nagios_servicestatus | repair | status | OK |
+-----------------------------+--------+----------+----------+
1 row in set (0.00 sec)
And now i reset to kernel queue parameters to baseline and restart ndo2db.
For the time being all is OK. I suppose that the system lost somes values and that produced the fail.
Any idea?
Thanks.