Page 1 of 1
Nagios Maintenance
Posted: Fri Apr 20, 2018 9:40 am
by FrontlineIT
Good morning Nagios team,
We have had a few issues with our Nagios instance where we had to reboot the server. This morning around 1am, our Nagios instance's services stopped and this has happened a few times over the last few months. Would it be possible for someone to take a look at the logs or perform some sort of maintenance to check the health of our Nagios instance?
Re: Nagios Maintenance
Posted: Fri Apr 20, 2018 2:04 pm
by cdienger
Feel free to PM me a profile(Admin > System Config > System Profile > Download System Profile) and I'd be glad to take a look at things. Note that the profile _can_ be too large to send and this is usually because it contains another zip file within containing the configuration. Feel free to remove this if needed. The filename is a .tar.gz and begins with a Unix epoch time. For example:1524250995.tar.gz.
Re: Nagios Maintenance
Posted: Mon Apr 23, 2018 2:32 pm
by cdienger
The profile showed corruption of the database which can be caused by performance issues on the machine. Follow
https://assets.nagios.com/downloads/nag ... ios-XI.pdf to setup a ramdisk as well as applying other tweaks to the system to improve performance.
Re: Nagios Maintenance
Posted: Wed Apr 25, 2018 9:57 pm
by FrontlineIT
What did you see that indicated database corruption?
Re: Nagios Maintenance
Posted: Thu Apr 26, 2018 9:30 am
by cdienger
180419 23:00:02 [Warning] mysqld: Disk is full writing './nagios/nagios_servicestatus.MYD' (Errcode: 28). W180420 6:59:05 InnoDB: Shutdown completed; log sequence number 86951804001
180420 6:59:44 [ERROR] mysqld: Table './nagios/nagios_servicestatus' is marked as crashed and should be repaired
180420 6:59:44 [Warning] Checking table: './nagios/nagios_servicestatus'
180420 6:59:44 [Warning] Recovering table: './nagios/nagios_servicestatus'
180420 6:59:45 [Note] Found 4520 of 4521 rows when repairing './nagios/nagios_servicestatus'
The drive space no longer appears to be a problem and there repair of a table, but I would still run the repair script to make sure everything else is good.