Extremely High Load

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cbroschard
Posts: 15
Joined: Wed Apr 17, 2013 10:54 am

Extremely High Load

Post by cbroschard »

Good afternoon,

We just upgraded our server to 5.5.7 today from 5.4.13 and now the load on our server is over 80. I'm using a simple w -u to check the load. How can I check what/why is stuck and causing our CPU run queue to go through the roof?

THanks,

Chris Broschard
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Extremely High Load

Post by npolovenko »

Hello, @cbroschard. Please run these commands if you are on CentOS/RHEL 6.X:
service crond stop
service npcd stop
service nagios stop
service ndo2db stop
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
service mysqld restart
service ndo2db start
service nagios start
service npcd start
service crond start
Or run these commands if you are on CentOS/RHEL 7.X:
systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond
Could you also send in your Nagios XI System Profile so I can review it?
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cbroschard
Posts: 15
Joined: Wed Apr 17, 2013 10:54 am

Re: Extremely High Load

Post by cbroschard »

Ok I did all that and the problem came back. I rebooted and I was good for about 5-10 minutes and the problem came back. Apparently it's mrtg that is causing this and we don't have any graphs anymore either. I just have a ton of those processes taking up CPU and holding up the server. This started right after upgrading. I'm sending my profile.zip separately as you asked.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Extremely High Load

Post by npolovenko »

@cbroschard, Please remove --user=nagios and --group=nagios from the /etc/cron.d/mrtg cron:
*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Then restart the crond:
service crond restart
Because of the amount of spooled perfdata it may take a while for the system to stabilize.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cbroschard
Posts: 15
Joined: Wed Apr 17, 2013 10:54 am

Re: Extremely High Load

Post by cbroschard »

Ok - I removed that and actually just restarted rather than waiting for it to die down. It's fine at the moment, do you think this is the ultimate fix for this problem or is it possible it could start up again?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Extremely High Load

Post by npolovenko »

@cbroschard, For some reason, the new security addition to the mrtg cron does not work well on certain systems. We're still looking at the cause of this issue. The solution is good for now as this is what the cron used to look like in previous versions of XI. If you upgrade in future there is a chance that the cron will get overridden to include username and password(again), but by that time hopefully, there will be a bug fix.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
cbroschard
Posts: 15
Joined: Wed Apr 17, 2013 10:54 am

Re: Extremely High Load

Post by cbroschard »

Ok that fixed the load issue but now I'm not getting graphs for any of our servers that have more than 3 drives. If they have 3 drives or less it works, anymore and it shows no data. Is there something else that I can do now to fix those? They did work fine on the previous release.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Extremely High Load

Post by ssax »

Did the number of disks returned by the check change at all? The backend RRD files expect the same number of performance data sources from the original creation or else they won't update.

Please PM one of us the RRD and XML file for one of these disk check services that is experiencing the issue from:

Code: Select all

/usr/local/nagios/share/perfdata/HOSTNAME/
Please PM on of us a fresh copy of your profile as well, you can download it from Admin > System Profile > Download Profile.
cbroschard
Posts: 15
Joined: Wed Apr 17, 2013 10:54 am

Re: Extremely High Load

Post by cbroschard »

I have just sent over the requested files via PM.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Extremely High Load

Post by ssax »

Did you send to swilkerson? If so, he's out today, please send to me or npolovenko.
Locked