Extremely High Load
-
cbroschard
- Posts: 15
- Joined: Wed Apr 17, 2013 10:54 am
Extremely High Load
Good afternoon,
We just upgraded our server to 5.5.7 today from 5.4.13 and now the load on our server is over 80. I'm using a simple w -u to check the load. How can I check what/why is stuck and causing our CPU run queue to go through the roof?
THanks,
Chris Broschard
We just upgraded our server to 5.5.7 today from 5.4.13 and now the load on our server is over 80. I'm using a simple w -u to check the load. How can I check what/why is stuck and causing our CPU run queue to go through the roof?
THanks,
Chris Broschard
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Extremely High Load
Hello, @cbroschard. Please run these commands if you are on CentOS/RHEL 6.X:
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
Or run these commands if you are on CentOS/RHEL 7.X:service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond start
Could you also send in your Nagios XI System Profile so I can review it?systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
cbroschard
- Posts: 15
- Joined: Wed Apr 17, 2013 10:54 am
Re: Extremely High Load
Ok I did all that and the problem came back. I rebooted and I was good for about 5-10 minutes and the problem came back. Apparently it's mrtg that is causing this and we don't have any graphs anymore either. I just have a ton of those processes taking up CPU and holding up the server. This started right after upgrading. I'm sending my profile.zip separately as you asked.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Extremely High Load
@cbroschard, Please remove --user=nagios and --group=nagios from the /etc/cron.d/mrtg cron:
Then restart the crond:*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Because of the amount of spooled perfdata it may take a while for the system to stabilize.service crond restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
cbroschard
- Posts: 15
- Joined: Wed Apr 17, 2013 10:54 am
Re: Extremely High Load
Ok - I removed that and actually just restarted rather than waiting for it to die down. It's fine at the moment, do you think this is the ultimate fix for this problem or is it possible it could start up again?
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Extremely High Load
@cbroschard, For some reason, the new security addition to the mrtg cron does not work well on certain systems. We're still looking at the cause of this issue. The solution is good for now as this is what the cron used to look like in previous versions of XI. If you upgrade in future there is a chance that the cron will get overridden to include username and password(again), but by that time hopefully, there will be a bug fix.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
cbroschard
- Posts: 15
- Joined: Wed Apr 17, 2013 10:54 am
Re: Extremely High Load
Ok that fixed the load issue but now I'm not getting graphs for any of our servers that have more than 3 drives. If they have 3 drives or less it works, anymore and it shows no data. Is there something else that I can do now to fix those? They did work fine on the previous release.
Re: Extremely High Load
Did the number of disks returned by the check change at all? The backend RRD files expect the same number of performance data sources from the original creation or else they won't update.
Please PM one of us the RRD and XML file for one of these disk check services that is experiencing the issue from:
Please PM on of us a fresh copy of your profile as well, you can download it from Admin > System Profile > Download Profile.
Please PM one of us the RRD and XML file for one of these disk check services that is experiencing the issue from:
Code: Select all
/usr/local/nagios/share/perfdata/HOSTNAME/-
cbroschard
- Posts: 15
- Joined: Wed Apr 17, 2013 10:54 am
Re: Extremely High Load
I have just sent over the requested files via PM.
Re: Extremely High Load
Did you send to swilkerson? If so, he's out today, please send to me or npolovenko.