Page 1 of 1

High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 1:44 pm
by sgd
Hello,

We've been running Nagios XI for some years on an older server that is starting to fail. I had already planned on migrating to a new machine, and when the old one started locking up on a regular basis I accelerated the process.

The old machine was a 32-bit machine running Centos 5.x. It had a single processor and was maxed out at 4GB of ram, but ran Nagios XI decently. I built the new machine with 4 processor cores and 8GB of ram, on a 64-bit machine running Centos 7.x. I installed Nagios XI version 5.4.2 on the new machine, and per the migration instructions, upgraded the old machine to 5.4.2 (it was running a recent version, but not that one). Immediately after the upgrade the load went way up on the old machine from an average of 3.x to 60+, and remained high.

The machine also locked up shortly thereafter, so I proceeded with the migration, fearing the old machine would die completely before long. I was able to accomplish the migration, including converting the rrd databases, and shortly thereafter the new machine's load shot up as well. It's now over 60, and has been that high since yesterday. This is seriously impacting performance, and some of the checks are timing out because of it.

What I'm seeing is that perfdataproc.php is spawning every five minutes and not completing. There are 190 instances of it running right now.

I have some other migration issues, but until I get the load under control I can't be sure if they're due to the load or external factors.

If there's something simple I should do I'd like to hear about it. If this is likely going to be a difficult issue I'll open a support case.

To recap - this happened on the old machine immediately after upgrading, and also happened on the new machine as soon as I did a restore from the old machine, so I think it is related to the old machine's configuration.

The new machine is running in a VM, so I can increase resources if necessary, but having given it 4x the processor power and 2x the memory, plus substantially faster disks, I don't think offhand it's a resource starvation issue.

Thanks.

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 1:47 pm
by sgd
Hi,

To be very clear, I installed CentOS 7 in a VM and installed NagiosXI on that - I did not install a VM image. I needed a custom filesystem layout to facilitate backups.

Cheers!

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 2:10 pm
by rkennedy
What version of XI were you previously upgrading from on the Cent5 machine? This may help to identify what's going on as varying versions could change things. Did you modify any custom commands that would have been defaulted back on a major upgrade by chance?

The fact that it happened on both cent5 and the cent7 make me think it may be something in the configuration somewhere.

Can you also PM over a profile, to dwhitfield and myself? (Admin -> System Profile -> Download Profile) This will have quite a few log files for us to review, to see if anything was impacted.

UPDATE: profile received and shared with techs

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 2:23 pm
by sgd
Sent.

I'm sorry, I don't recall what version the old machine was running prior to the upgrade. It was recent, but not the immediately previous version. Automatic updates were not working, and I skipped a minor version or two before performing a manual upgrade. I did run the repair_databases.sh script before performing the upgrade. I still have the old machine online, but nagios is disabled. Is there a way to check the upgrade history via the command line?

Thanks!

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 3:51 pm
by tgriep
I am seeing some MYSQL connection errors from the log files.
Follow this KB article to increase the connections to the MYSQL database
https://support.nagios.com/kb/article.php?id=513

I also see some permission errors, can you run the following as root to fix the permissions?

Code: Select all

chown nagios.nagios /usr/local/nagios/var/host-perfdata
chmod 775 /usr/local/nagios/var/host-perfdata
chown nagios.nagios /usr/local/nagios/var/service-perfdata
chmod 775 /usr/local/nagios/var/service-perfdata
chown -R nagios.nagios /usr/local/nagios/var/spool/
chmod 775 -R /usr/local/nagios/var/spool/
Then restart the cron daemon by running

Code: Select all

service crond restart
After that, run this and post the output so we can check to see if the permissions are good and that the processes are running.

Code: Select all

ls -l /usr/local/nagios/var/
ls -l /usr/local/nagios/var/spool/
ps -ef --cols=300

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 4:19 pm
by sgd
Thanks for the quick answer!

Permissions seems to be the culprit - I'm sorry I didn't check on that before posting here.
I've fixed permissions and the load is steadily dropping - it's at 28.90 right now, down from over 60.

# ls -l /usr/local/nagios/var/
total 15440
drwxrwxr-x. 2 nagios nagios 81920 Feb 16 00:00 archives
-rw-r--r-- 1 nagios nagios 298 Feb 16 13:10 host-perfdata
-rw-r--r--. 1 nagios nagios 427 Feb 15 16:13 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Feb 15 16:13 nagios.lock
-rw-r--r-- 1 nagios nagios 1155139 Feb 16 13:10 nagios.log
-rw-rw-r--. 1 nagios nagios 583491 Jan 21 07:59 nagios.tmprGtLN5
-rw-rw-r--. 1 nagios nagios 584341 Jan 31 07:59 nagios.tmpYFsSmK
-rw-r--r-- 1 nagios nagios 5 Feb 15 16:03 ndo2db.lock
-rw-r--r--. 1 nagios nagios 0 Feb 15 16:13 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Feb 15 16:03 ndo.sock
-rw-r--r--. 1 nagios nagios 569657 Feb 16 13:10 npcd.log
-rw-r--r--. 1 apache nagios 10485807 Jun 20 2012 npcd.log.old
-rw-r--r--. 1 apache nagios 395883 Dec 9 14:49 objects.cache
-rw-r--r--. 1 apache nagios 395883 Dec 9 14:54 objects.precache
-rw-rw-r--. 1 apache nagios 15962 Jun 14 2016 perfdata.log
-rw------- 1 nagios nagios 586414 Feb 16 12:13 retention.dat
drwxrwsr-x. 2 apache nagios 41 Feb 15 16:13 rw
-rw-r--r-- 1 nagios nagios 331 Feb 16 13:10 service-perfdata
drwxrwxr-x. 5 nagios nagios 55 Sep 23 2011 spool
drwxr-xr-x. 2 apache nagios 22 Jan 2 2015 stats
-rw-rw-r-- 1 nagios nagios 584515 Feb 16 13:10 status.dat

# ls -l /usr/local/nagios/var/spool
total 7424
drwxrwsr-x. 2 nagios nagios 21 Feb 15 16:13 checkresults
drwxr-xr-x. 2 nagios nagios 5787648 Feb 16 13:11 perfdata
drwxr-xr-x. 2 nagios nagios 6 Feb 16 13:11 xidpe

Re: High load after upgrade to 5.4.2 - large # of cron jobs

Posted: Thu Feb 16, 2017 4:22 pm
by dwhitfield
sgd wrote: I'm sorry I didn't check on that before posting here.
No problem at all. That's why we're here. Furthermore, these perfdata issues are tricky. They cause enough headaches that we are getting rid of them in Nagios 6
New performance graphing to replace RRDs
There's no need to respond if things are working for you, but if the permissions turn out to be an incomplete fix, please let us know.