Nagios XI - High Load (mysqld/httpd)
Posted: Wed Dec 04, 2019 7:57 am
Nagios XI - High Load (mysqld/httpd)
Nagios XI 5.6.8
CentOS release 6.10 (Final)
64 bit
8 CPU
16GB Mem
load average: 104.40, 68.17, 63.40
The last few days post an upgrade a few days prior, the server has been spiking on load well over 100 for excessively long periods.
I have restarted the server, services, checked all logs and done various repairs on the database, even truncated tables, and done the dbmaint.
/usr/local/nagiosxi/scripts/repair_databases.sh
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_logentries
myisamchk -r -f nagios_notifications
myisamchk -r -f nagios_statehistory
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_statehistory'
/usr/local/nagiosxi/scripts/repair_databases.sh
But the nom log keeps reporting the error:
Database Error
A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB:
/usr/local/nagiosxi/scripts/repair_databases.sh
And the /var/log/messages has various references to unable to connect to the DB:
Dec 4 14:46:43 nagiosxi nagios: wproc: GLOBAL SERVICE EVENTHANDLER job 3024 from worker Core Worker 21670 is a non-check helper but exited with return code 1
Dec 4 14:46:43 nagiosxi nagios: wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Dec 4 14:46:43 nagiosxi nagios: wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Any advise on how to find and fix the issue would be appreciated.
Support update: Profile.zip downloaded and shared with team.
Nagios XI 5.6.8
CentOS release 6.10 (Final)
64 bit
8 CPU
16GB Mem
load average: 104.40, 68.17, 63.40
The last few days post an upgrade a few days prior, the server has been spiking on load well over 100 for excessively long periods.
I have restarted the server, services, checked all logs and done various repairs on the database, even truncated tables, and done the dbmaint.
/usr/local/nagiosxi/scripts/repair_databases.sh
service mysqld stop
cd /var/lib/mysql/nagios
myisamchk -r -f nagios_logentries
myisamchk -r -f nagios_notifications
myisamchk -r -f nagios_statehistory
service mysqld start
rm -f /usr/local/nagiosxi/var/dbmaint.lock
php /usr/local/nagiosxi/cron/dbmaint.php
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_logentries'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_notifications'
mysql -u ndoutils -pn@gweb nagios -e 'TRUNCATE TABLE nagios_statehistory'
/usr/local/nagiosxi/scripts/repair_databases.sh
But the nom log keeps reporting the error:
Database Error
A database connection error has been detected, please follow the repair prompt below. If the issue persists, please contact Nagios support.
Run the following from the CLI as root to attempt to repair the DB:
/usr/local/nagiosxi/scripts/repair_databases.sh
And the /var/log/messages has various references to unable to connect to the DB:
Dec 4 14:46:43 nagiosxi nagios: wproc: GLOBAL SERVICE EVENTHANDLER job 3024 from worker Core Worker 21670 is a non-check helper but exited with return code 1
Dec 4 14:46:43 nagiosxi nagios: wproc: early_timeout=0; exited_ok=1; wait_status=256; error_code=0;
Dec 4 14:46:43 nagiosxi nagios: wproc: stdout line 01: UNABLE TO CONNECT TO DB - EXITING!
Any advise on how to find and fix the issue would be appreciated.
Support update: Profile.zip downloaded and shared with team.