Frequent increase in load

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
deek
Posts: 194
Joined: Fri Apr 26, 2019 2:01 am

Frequent increase in load

Post by deek »

Hey ,

The NagiosXI load is going high very frequently . We use check_xi_by_ssh plugin and agentless is set up .
We are using 16GB RAM and 16 CPU core . The total number of hosts are 432 and total number of services are 8568 .
Please let us know how we can reduce the spikes in load .
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Frequent increase in load

Post by benjaminsmith »

Hi @deek,

Often high load can be the result of corrupted database tables, I would check the log for any error messages. If needed, run the following command as root to repair the database tables.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
A few more questions...

Is this a production or test server?

Are check results being processed (updated) in the GUI?

What is the average check interval for the check_by_ssh services?

Lastly, please send over the profile and we'll review the logs for you. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
deek
Posts: 194
Joined: Fri Apr 26, 2019 2:01 am

Re: Frequent increase in load

Post by deek »

Hello ,

1. It is a production server
2. Check results are been updated .
3. Check interval = 10min , retry interval =5min , max check attempts =1 . We have made this for all the hosts and services .
4. I have DM you the profile

Currently we have 6964 hosts and 6751 services .
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Frequent increase in load

Post by benjaminsmith »

Hi @deek

Thank you for the system profile. The main issue with load looks to be related to the crashed database tables.
210310 1:53:10 [ERROR] mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
ndo2db: Warning: queue send error, retrying...
Mar 10 00:22:45 lxappnagprd031 ndo2db: Error: mysql_query() failed for 'INSERT INTO nagios_logentries SET instance_id='1',
Let's go ahead a run the repair script. Log in as root and run the following command from the CLI.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
That should help improve the load. However, the message queues have backed up, run the following commands to restart the software stack and clear the kernel message queue.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl restart httpd
systemctl start ndo2db
systemctl start nagios 
systemctl start npcd
systemctl start crond
Next, run a top command and let me know if the load averages have dropped from what's in the profile.

Code: Select all

top - 01:53:20 up 4 days, 11:16,  0 users,  load average: 278.19, 327.16, 356.20
Going forward, if you have ongoing issues with crashed databse tables, it would be beneficial to convert the storage engine on the nagios database from myisam to innnodb. We have instructions for that on our knowledgebase. Be sure to take a full backup before making any changes.

Database Storage Engine and High CPU usage in Nagios XI

Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked