Page 1 of 1

High CPU/load avg

Posted: Tue Mar 02, 2021 8:36 am
by cbroschard
Good morning,

We keep having an extreme load issue on our server where the load average on bootup especially, after nagios starts, will stay rather high for a good amount of time before finally settling down. As well we keep having issues where the checks stop until we restart the nagios service. I'm not sure of the root cause and was hoping for some pointers on what to check.

Thanks,

Chris Broschard

Re: High CPU/load avg

Posted: Tue Mar 02, 2021 10:15 am
by dchurch
What process specifically is the culprit? Inspect running processes to get a list

If you PM me a system profile I can diagnose further. Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button.

If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting /usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.

Things you can try in the mean time:
- Tun the database repair

Re: High CPU/load avg

Posted: Tue Mar 02, 2021 10:43 am
by cbroschard
I just pm'ed you the profile.zip file as requested - the load is mostly caused by the check processes but also nagios itself. We have the db offloaded to another server and that isn't seeing much load, just the front end web server is.

Re: High CPU/load avg

Posted: Tue Mar 02, 2021 5:02 pm
by dchurch
Looks like some host checks are timing out. Even simple ones like check_snmp are sometimes taking more than 30 seconds. This could be caused by some sort of resource starvation, e.g. network connectivity being spotty, CPU being taken up at boot time, high disk usage while the machine is booting up.

Try disabling the nagios, then rebooting and inspecting CPU usage as it's starting up.

Some other things I noticed that couldn't hurt:
- There are some Vim swap files in /usr/local/nagios/etc/static (*.sw[mnop]) - those can and probably should be removed.
- There are some Emacs swaps in /usr/local/nagios/etc (#*#) - also should be removed
(these two makes me think someone's been editing the config files instead of generating them thru Nagios XI)