Page 1 of 1
Frequent increase in load
Posted: Tue Mar 02, 2021 9:55 am
by deek
Hey ,
The NagiosXI load is going high very frequently . We use check_xi_by_ssh plugin and agentless is set up .
We are using 16GB RAM and 16 CPU core . The total number of hosts are 432 and total number of services are 8568 .
Please let us know how we can reduce the spikes in load .
Re: Frequent increase in load
Posted: Wed Mar 03, 2021 10:21 am
by benjaminsmith
Hi
@deek,
Often high load can be the result of corrupted database tables, I would check the log for any error messages. If needed, run the following command as root to repair the database tables.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
A few more questions...
Is this a production or test server?
Are check results being processed (updated) in the GUI?
What is the average check interval for the check_by_ssh services?
Lastly, please send over the profile and we'll review the logs for you. Thanks, Benjamin
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Re: Frequent increase in load
Posted: Wed Mar 10, 2021 2:04 am
by deek
Hello ,
1. It is a production server
2. Check results are been updated .
3. Check interval = 10min , retry interval =5min , max check attempts =1 . We have made this for all the hosts and services .
4. I have DM you the profile
Currently we have 6964 hosts and 6751 services .
Re: Frequent increase in load
Posted: Wed Mar 10, 2021 4:02 pm
by benjaminsmith
Hi
@deek
Thank you for the system profile. The main issue with load looks to be related to the crashed database tables.
210310 1:53:10 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
ndo2db: Warning: queue send error, retrying...
Mar 10 00:22:45 lxappnagprd031 ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_logentries SET instance_id='1',
Let's go ahead a run the repair script. Log in as root and run the following command from the CLI.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
That should help improve the load. However, the message queues have backed up, run the following commands to restart the software stack and clear the kernel message queue.
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Next, run a top command and let me know if the load averages have dropped from what's in the profile.
Code: Select all
top - 01:53:20 up 4 days, 11:16, 0 users, load average: 278.19, 327.16, 356.20
Going forward, if you have ongoing issues with crashed databse tables, it would be beneficial to convert the storage engine on the nagios database from myisam to innnodb. We have instructions for that on our knowledgebase. Be sure to take a full backup before making any changes.
Database Storage Engine and High CPU usage in Nagios XI
Regards,
Benjamin