very high CPU load spikes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
murdock
Posts: 66
Joined: Mon Oct 27, 2014 12:14 pm

very high CPU load spikes

Post by murdock »

Hi,

I'm experiencing frequent, brief very high CPU load spikes since upgrading (XI 5.8.7 / CentOS 7.9) a few months ago, and the problem is getting worse. In the wake of these spikes I'm seeing timeouts and so many checks queued that the system stops processing / freezes up.

Load averages shoot up to over 300 or higher as an "explosion" of Python instances start; then typically within 3-5 minutes things drift back to normal.

I'm not seeing anything to explain this in the logs; I've searched the forums and not found anything useful or directly applicable to this situation.

Can I DM a profile.zip and other details to someone?

Rob
User avatar
kfanselow
Posts: 247
Joined: Tue Aug 31, 2021 3:25 pm

Re: very high CPU load spikes

Post by kfanselow »

Hi Rob,

How frequently is this occurring ?

Go ahead and PM the profile to me and if you can generate it while the event is occurring that would be even better.

Also next time you do observe the event please run the following as root PM the file /tmp/info.txt to me as well ?

Code: Select all

ps -axef > /tmp/info.txt 
printf "\n========================== `date` ==========================\n" >>  /tmp/info.txt
ss -na >>  /tmp/info.txt
printf "\n========================== `date` ==========================\n" >>  /tmp/info.txt
sar -A >>  /tmp/info.txt   

Thanks and Best Regards,
Keith
murdock
Posts: 66
Joined: Mon Oct 27, 2014 12:14 pm

Re: very high CPU load spikes

Post by murdock »

Hi Keith,

I have sent you a PM as discussed.

Rob
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: very high CPU load spikes

Post by ssax »

Your system is showing it having IO wait spikes (taken from the top command output):

Code: Select all

7.1 wa
It could be caused by a piece of security software such as Crowdstrike/Falcon Sensor which we see on the system:
- I would try disabling them and see if that resolves it as that would be my first guess at where the IO wait is coming from

Code: Select all

root        687      1  0 Feb09 ?        00:00:00 /opt/CrowdStrike/falcond
root        688    687  0 Feb09 ?        01:09:13 falcon-sensor
Anything over 5% will generally cause global performance issues on a system as it means that percentage of the time the CPU is waiting on storage/IO before being able to continue with the next request and what you can see as symptoms are the CPU backing up (increasing CPU usage), load average increasing, etc. You would usually see other anomalies as well (checks timing out, etc) that would not seem like they are related but are.

Let's take a look at the size of some things, send the output of these commands as root:
- NOTE: You may need to adjust the -uroot and -pnagiosxi in the last two commands if you've changed the root mysql password

Code: Select all

ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
su -s /bin/bash -c 'ulimit -a' apache
mysql -uroot -pnagiosxi nagios -e 'SELECT COUNT(*) FROM nagios_objects;'
mysql -uroot -pnagiosxi --table -e "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');"

Since you are seeing the IO wait, some things I would recommend that can help:

1. Setting up a RAM Disk:

https://assets.nagios.com/downloads/nag ... giosXI.pdf


2. Edit your /usr/local/nagios/etc/nagios.cfg and set this:
- NOTE: This is duplicate data from /usr/local/nagios/var/nagios.log so you'll still have access to the logs

Code: Select all

use_syslog=1
Then restart nagios:

Code: Select all

systemctl restart nagios
3. Set ALL THREE Optimize Intervals to 300 or higher in Admin > Performance Settings > Databases tab.
murdock
Posts: 66
Joined: Mon Oct 27, 2014 12:14 pm

Re: very high CPU load spikes

Post by murdock »

I am working on this and will follow up in a day or two (notably our Security people have major issues with me interfering with their Crowdstrike), thank you for being patient.

Rob
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: very high CPU load spikes

Post by ssax »

No problem, we'll keep an eye out for your update.
murdock
Posts: 66
Joined: Mon Oct 27, 2014 12:14 pm

Re: very high CPU load spikes

Post by murdock »

Sent an update via PM
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: very high CPU load spikes

Post by ssax »

I apologize, can you get the output of this one when the CPU spike is occurring? The other one doesn't contain the CPU/mem use of each process and that will give us the information we need.

Code: Select all

ps -auxef > /tmp/info.txt 
murdock
Posts: 66
Joined: Mon Oct 27, 2014 12:14 pm

Re: very high CPU load spikes

Post by murdock »

Hi Sean,

Certainly; I'll follow up after the next spike occurs.

Rob
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: very high CPU load spikes

Post by ssax »

Thank you, received. I'll post an update shortly after this remote session I have.
Locked