Hey ,
The NagiosXI load is going high very frequently . We use check_xi_by_ssh plugin and agentless is set up .
We are using 16GB RAM and 16 CPU core . The total number of hosts are 432 and total number of services are 8568 .
Please let us know how we can reduce the spikes in load .
Frequent increase in load
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Frequent increase in load
Hi @deek,
Often high load can be the result of corrupted database tables, I would check the log for any error messages. If needed, run the following command as root to repair the database tables.
A few more questions...
Is this a production or test server?
Are check results being processed (updated) in the GUI?
What is the average check interval for the check_by_ssh services?
Lastly, please send over the profile and we'll review the logs for you. Thanks, Benjamin
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Often high load can be the result of corrupted database tables, I would check the log for any error messages. If needed, run the following command as root to repair the database tables.
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Is this a production or test server?
Are check results being processed (updated) in the GUI?
What is the average check interval for the check_by_ssh services?
Lastly, please send over the profile and we'll review the logs for you. Thanks, Benjamin
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Frequent increase in load
Hello ,
1. It is a production server
2. Check results are been updated .
3. Check interval = 10min , retry interval =5min , max check attempts =1 . We have made this for all the hosts and services .
4. I have DM you the profile
Currently we have 6964 hosts and 6751 services .
1. It is a production server
2. Check results are been updated .
3. Check interval = 10min , retry interval =5min , max check attempts =1 . We have made this for all the hosts and services .
4. I have DM you the profile
Currently we have 6964 hosts and 6751 services .
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Frequent increase in load
Hi @deek
Thank you for the system profile. The main issue with load looks to be related to the crashed database tables.
That should help improve the load. However, the message queues have backed up, run the following commands to restart the software stack and clear the kernel message queue.
Next, run a top command and let me know if the load averages have dropped from what's in the profile.
Going forward, if you have ongoing issues with crashed databse tables, it would be beneficial to convert the storage engine on the nagios database from myisam to innnodb. We have instructions for that on our knowledgebase. Be sure to take a full backup before making any changes.
Database Storage Engine and High CPU usage in Nagios XI
Regards,
Benjamin
Thank you for the system profile. The main issue with load looks to be related to the crashed database tables.
Let's go ahead a run the repair script. Log in as root and run the following command from the CLI.210310 1:53:10 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
ndo2db: Warning: queue send error, retrying...
Mar 10 00:22:45 lxappnagprd031 ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_logentries SET instance_id='1',
Code: Select all
/usr/local/nagiosxi/scripts/repair_databases.sh
Code: Select all
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Code: Select all
top - 01:53:20 up 4 days, 11:16, 0 users, load average: 278.19, 327.16, 356.20
Database Storage Engine and High CPU usage in Nagios XI
Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!