Page 1 of 2

very high CPU load spikes

Posted: Mon Feb 14, 2022 11:13 am
by murdock
Hi,

I'm experiencing frequent, brief very high CPU load spikes since upgrading (XI 5.8.7 / CentOS 7.9) a few months ago, and the problem is getting worse. In the wake of these spikes I'm seeing timeouts and so many checks queued that the system stops processing / freezes up.

Load averages shoot up to over 300 or higher as an "explosion" of Python instances start; then typically within 3-5 minutes things drift back to normal.

I'm not seeing anything to explain this in the logs; I've searched the forums and not found anything useful or directly applicable to this situation.

Can I DM a profile.zip and other details to someone?

Rob

Re: very high CPU load spikes

Posted: Mon Feb 14, 2022 6:03 pm
by kfanselow
Hi Rob,

How frequently is this occurring ?

Go ahead and PM the profile to me and if you can generate it while the event is occurring that would be even better.

Also next time you do observe the event please run the following as root PM the file /tmp/info.txt to me as well ?

Code: Select all

ps -axef > /tmp/info.txt 
printf "\n========================== `date` ==========================\n" >>  /tmp/info.txt
ss -na >>  /tmp/info.txt
printf "\n========================== `date` ==========================\n" >>  /tmp/info.txt
sar -A >>  /tmp/info.txt   

Thanks and Best Regards,
Keith

Re: very high CPU load spikes

Posted: Tue Feb 15, 2022 7:07 pm
by murdock
Hi Keith,

I have sent you a PM as discussed.

Rob

Re: very high CPU load spikes

Posted: Wed Feb 16, 2022 1:17 pm
by ssax
Your system is showing it having IO wait spikes (taken from the top command output):

Code: Select all

7.1 wa
It could be caused by a piece of security software such as Crowdstrike/Falcon Sensor which we see on the system:
- I would try disabling them and see if that resolves it as that would be my first guess at where the IO wait is coming from

Code: Select all

root        687      1  0 Feb09 ?        00:00:00 /opt/CrowdStrike/falcond
root        688    687  0 Feb09 ?        01:09:13 falcon-sensor
Anything over 5% will generally cause global performance issues on a system as it means that percentage of the time the CPU is waiting on storage/IO before being able to continue with the next request and what you can see as symptoms are the CPU backing up (increasing CPU usage), load average increasing, etc. You would usually see other anomalies as well (checks timing out, etc) that would not seem like they are related but are.

Let's take a look at the size of some things, send the output of these commands as root:
- NOTE: You may need to adjust the -uroot and -pnagiosxi in the last two commands if you've changed the root mysql password

Code: Select all

ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
su -s /bin/bash -c 'ulimit -a' apache
mysql -uroot -pnagiosxi nagios -e 'SELECT COUNT(*) FROM nagios_objects;'
mysql -uroot -pnagiosxi --table -e "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');"

Since you are seeing the IO wait, some things I would recommend that can help:

1. Setting up a RAM Disk:

https://assets.nagios.com/downloads/nag ... giosXI.pdf


2. Edit your /usr/local/nagios/etc/nagios.cfg and set this:
- NOTE: This is duplicate data from /usr/local/nagios/var/nagios.log so you'll still have access to the logs

Code: Select all

use_syslog=1
Then restart nagios:

Code: Select all

systemctl restart nagios
3. Set ALL THREE Optimize Intervals to 300 or higher in Admin > Performance Settings > Databases tab.

Re: very high CPU load spikes

Posted: Thu Feb 17, 2022 11:24 am
by murdock
I am working on this and will follow up in a day or two (notably our Security people have major issues with me interfering with their Crowdstrike), thank you for being patient.

Rob

Re: very high CPU load spikes

Posted: Fri Feb 18, 2022 1:55 pm
by ssax
No problem, we'll keep an eye out for your update.

Re: very high CPU load spikes

Posted: Mon Feb 21, 2022 6:19 pm
by murdock
Sent an update via PM

Re: very high CPU load spikes

Posted: Tue Feb 22, 2022 1:29 pm
by ssax
I apologize, can you get the output of this one when the CPU spike is occurring? The other one doesn't contain the CPU/mem use of each process and that will give us the information we need.

Code: Select all

ps -auxef > /tmp/info.txt 

Re: very high CPU load spikes

Posted: Wed Feb 23, 2022 12:17 pm
by murdock
Hi Sean,

Certainly; I'll follow up after the next spike occurs.

Rob

Re: very high CPU load spikes

Posted: Thu Feb 24, 2022 3:28 pm
by ssax
Thank you, received. I'll post an update shortly after this remote session I have.