Page 1 of 2
Extremely High Load
Posted: Mon Dec 10, 2018 1:05 pm
by cbroschard
Good afternoon,
We just upgraded our server to 5.5.7 today from 5.4.13 and now the load on our server is over 80. I'm using a simple w -u to check the load. How can I check what/why is stuck and causing our CPU run queue to go through the roof?
THanks,
Chris Broschard
Re: Extremely High Load
Posted: Mon Dec 10, 2018 4:01 pm
by npolovenko
Hello,
@cbroschard. Please run these commands if you are on CentOS/RHEL 6.X:
service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond start
Or run these commands if you are on CentOS/RHEL 7.X:
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Could you also send in your Nagios XI System Profile so I can review it?
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
Re: Extremely High Load
Posted: Mon Dec 10, 2018 4:41 pm
by cbroschard
Ok I did all that and the problem came back. I rebooted and I was good for about 5-10 minutes and the problem came back. Apparently it's mrtg that is causing this and we don't have any graphs anymore either. I just have a ton of those processes taking up CPU and holding up the server. This started right after upgrading. I'm sending my profile.zip separately as you asked.
Re: Extremely High Load
Posted: Mon Dec 10, 2018 4:52 pm
by npolovenko
@cbroschard, Please remove --user=nagios and --group=nagios from the /etc/cron.d/mrtg cron:
*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Then restart the crond:
service crond restart
Because of the amount of spooled perfdata it may take a while for the system to stabilize.
Re: Extremely High Load
Posted: Mon Dec 10, 2018 5:09 pm
by cbroschard
Ok - I removed that and actually just restarted rather than waiting for it to die down. It's fine at the moment, do you think this is the ultimate fix for this problem or is it possible it could start up again?
Re: Extremely High Load
Posted: Mon Dec 10, 2018 5:22 pm
by npolovenko
@cbroschard, For some reason, the new security addition to the mrtg cron does not work well on certain systems. We're still looking at the cause of this issue. The solution is good for now as this is what the cron used to look like in previous versions of XI. If you upgrade in future there is a chance that the cron will get overridden to include username and password(again), but by that time hopefully, there will be a bug fix.
Re: Extremely High Load
Posted: Tue Dec 11, 2018 11:33 am
by cbroschard
Ok that fixed the load issue but now I'm not getting graphs for any of our servers that have more than 3 drives. If they have 3 drives or less it works, anymore and it shows no data. Is there something else that I can do now to fix those? They did work fine on the previous release.
Re: Extremely High Load
Posted: Tue Dec 11, 2018 1:28 pm
by ssax
Did the number of disks returned by the check change at all? The backend RRD files expect the same number of performance data sources from the original creation or else they won't update.
Please PM one of us the RRD and XML file for one of these disk check services that is experiencing the issue from:
Code: Select all
/usr/local/nagios/share/perfdata/HOSTNAME/
Please PM on of us a fresh copy of your profile as well, you can download it from Admin > System Profile > Download Profile.
Re: Extremely High Load
Posted: Tue Dec 11, 2018 3:09 pm
by cbroschard
I have just sent over the requested files via PM.
Re: Extremely High Load
Posted: Tue Dec 11, 2018 4:38 pm
by ssax
Did you send to swilkerson? If so, he's out today, please send to me or npolovenko.