Hi people.
I am working about a problem when the system crash out with a kernel panic and is needed to reboot.
After the reboot I observe a general dysfunction of Nagios dashboard: not refresh, not working the check buttons, not to send emails.
This is solved for a minutes when a simple restart of nagios but in a short time this problem is reproduced.
I can see in /var/log/messages the :
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 6400 of 15737 messages and 6553600 of 6553600 bytes in the queue. See README for kernel tuning options.
I check the problem is solved for a minutes with only restart ndo2db, but in a short time the problem returns:
ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README
Never before have we needed to modify the Kernel queue parameters but, we do that.
echo 65536000 > /proc/sys/kernel/msgmnb
echo 65536000 > /proc/sys/kernel/msgmax
service ndo2db restart
So, In this case the problem is solved, and all its OK for a little more time, but in 3 hours the malfunction come back.
So, I check in this case the database, and I see that from the reeboot we have a ERROR message in nagios database. I read /var/log/mysql.log:
150727 04:02:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150727 4:02:28 InnoDB: Initializing buffer pool, size = 8.0M
150727 4:02:28 InnoDB: Completed initialization of buffer pool
150727 4:02:28 InnoDB: Started; log sequence number 0 44243
150727 4:02:29 [Note] Event Scheduler: Loaded 0 events
150727 4:02:29 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: XXXX Source distribution
150727 4:10:36 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:22:42 [ERROR] Got error 134 when reading table './nagios/nagios_servicestatus'
150727 4:24:33 [Note] /usr/libexec/mysqld: Normal shutdown
I consult the official documentation of MYSQL and i find this simple solution to fix the error:
mysql> use nagios;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> repair table nagios_servicestatus;
+-----------------------------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+--------+----------+----------+
| nagios.nagios_servicestatus | repair | status | OK |
+-----------------------------+--------+----------+----------+
1 row in set (0.00 sec)
And now i reset to kernel queue parameters to baseline and restart ndo2db.
For the time being all is OK. I suppose that the system lost somes values and that produced the fail.
Any idea?
Thanks.
Error in nagios after kernel panic
Re: Error in nagios after kernel panic
Definitively that was the problem. The corrupt table by the system hang, causes the fail in nagios and ndo2db. I have read a lot about tuning the kernel parameters with this message ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README, but this only is a patch and doesn't fix the problem.
I have readed another conversation (some years old) in this forum without conclusions about this problem. Maybe the same problem?
I have readed another conversation (some years old) in this forum without conclusions about this problem. Maybe the same problem?
Re: Error in nagios after kernel panic
Are the mysql tables growing too large? What is the output of the following command?
You may need to truncate some of the mysql tables and rerun the database repair script as described in the document below:
https://assets.nagios.com/downloads/nag ... tabase.pdf
Code: Select all
ll -Sh /var/lib/mysql/nagios | headhttps://assets.nagios.com/downloads/nag ... tabase.pdf
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Error in nagios after kernel panic
Hello Imitchev:
The database of nagios isn't to big. The size:
total 2,5G
-rw-rw---- 1 mysql mysql 2,1G jul 29 03:22 nagios_logentries.MYD
-rw-rw---- 1 mysql mysql 153M jul 29 03:22 nagios_logentries.MYI
-rw-rw---- 1 mysql mysql 135M jul 29 03:24 nagios_servicechecks.MYD
-rw-rw---- 1 mysql mysql 67M jul 29 03:24 nagios_systemcommands.MYD
-rw-rw---- 1 mysql mysql 28M jul 29 03:24 nagios_servicechecks.MYI
-rw-rw---- 1 mysql mysql 24M jul 29 00:00 nagios_timedevents.MYI
-rw-rw---- 1 mysql mysql 24M jul 29 00:00 nagios_timedevents.MYD
-rw-rw---- 1 mysql mysql 22M jul 29 03:24 nagios_systemcommands.MYI
-rw-rw---- 1 mysql mysql 8,7M jul 29 03:24 nagios_hostchecks.MYD
After 24 hours the problem was solved definitively, whith the "repair table" command of mysql and restart of services.
Lot of thanks for the documentation that you gave me. I added this for mi background of operations with nagios.
The database of nagios isn't to big. The size:
total 2,5G
-rw-rw---- 1 mysql mysql 2,1G jul 29 03:22 nagios_logentries.MYD
-rw-rw---- 1 mysql mysql 153M jul 29 03:22 nagios_logentries.MYI
-rw-rw---- 1 mysql mysql 135M jul 29 03:24 nagios_servicechecks.MYD
-rw-rw---- 1 mysql mysql 67M jul 29 03:24 nagios_systemcommands.MYD
-rw-rw---- 1 mysql mysql 28M jul 29 03:24 nagios_servicechecks.MYI
-rw-rw---- 1 mysql mysql 24M jul 29 00:00 nagios_timedevents.MYI
-rw-rw---- 1 mysql mysql 24M jul 29 00:00 nagios_timedevents.MYD
-rw-rw---- 1 mysql mysql 22M jul 29 03:24 nagios_systemcommands.MYI
-rw-rw---- 1 mysql mysql 8,7M jul 29 03:24 nagios_hostchecks.MYD
After 24 hours the problem was solved definitively, whith the "repair table" command of mysql and restart of services.
Lot of thanks for the documentation that you gave me. I added this for mi background of operations with nagios.
Re: Error in nagios after kernel panic
I am glad your issue has been resolved! I will be locking this topic and marking it as resolved. If you have any other, unrelated issues, please start a new thread.
FYI, If you don't care about historical log data, you could truncate the "nagios_logentries" table, which is the largest one you have (2,1G). If you decide to do so, you will need to rerun the database repair script as described in the document I mentioned in my previous post. Truncating this table should improve the performance a bit.
FYI, If you don't care about historical log data, you could truncate the "nagios_logentries" table, which is the largest one you have (2,1G). If you decide to do so, you will need to rerun the database repair script as described in the document I mentioned in my previous post. Truncating this table should improve the performance a bit.
Be sure to check out our Knowledgebase for helpful articles and solutions!