Performance Issues / fork() errors

abrist · Post by **abrist** » Mon Feb 18, 2013 12:46 pm

chrisp wrote:I'll let you know tomorrow, when we put it "live"...

I have to chuckle, as you have used the base centos install to more or less setup a netboot environment for the network install. I still think you were close to having the previous install (with new kernel) working, but sometimes fixing is not faster. Let us know how it goes, don't hesitate to ask for help if needed.

chrisp · Post by **chrisp** » Mon Feb 18, 2013 1:02 pm

Well, the PXE boot wasn't referring to any dodgy partitions, so went fine. I agree, I was SO close, but the number of things I tried, just to get it to auto-boot onto its disk was just mental. I think the kernel panics were related to the /etc/fstab not being right, but hey ho, I'm back on top now.

abrist · Post by **abrist** » Mon Feb 18, 2013 1:06 pm

Fantastic. And with a real kernel to boot!

chrisp · Post by **chrisp** » Wed Feb 20, 2013 7:45 am

# uname -a
Linux Nagios 2.6.32-279.22.1.el6.x86_64 #1 SMP Wed Feb 6 03:10:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

We've backed up our old server & restored it onto the new one. The system is up and running (in paralell), with just a few service & host checks not working, but that's just that they don't yet trust the new server's IP or DNS needs a tweak... Here's a humorous image to show the difference between the old host (16GB RAM, 4 CPU Cores & Software RAID1 HDDs) & the new host (32GB RAM, 8 CPU Cores & Software RAID1 SSDs): -

OldNagiosVsNewNagios.png

However, I've seen some issues with ndo2db & nagios starting up & rrdcached segfaulting on boot...

This is how it looks on clean boot: -

Code: Select all

root@Nagios:~# for SERVICE in nagios ndo2db mysqld postgresql rrdcached npcd ; do echo -n "${SERVICE}: " ; service ${SERVICE} status ; done
nagios: No lock file found in /usr/local/nagios/var/nagios.lock
ndo2db: ndo2db is not running but subsystem locked
mysqld: mysqld (pid  2066) is running...
postgresql: postmaster (pid  2103) is running...
rrdcached: rrdcached is stopped
npcd: NPCD running (pid 2277).

They're all set to start mostly as I'd expect: -

Code: Select all

# for SERVICE in nagios ndo2db mysqld postgresql rrdcached npcd ; do echo -n "${SERVICE}: " ; chkconfig --list ${SERVICE} ; done          
nagios: nagios          0:off   1:off   2:on    3:on    4:on    5:on    6:off
ndo2db: ndo2db          0:off   1:off   2:on    3:on    4:on    5:on    6:off
mysqld: mysqld          0:off   1:off   2:on    3:on    4:on    5:on    6:off
postgresql: postgresql          0:off   1:off   2:on    3:on    4:on    5:on    6:off
rrdcached: rrdcached            0:off   1:off   2:on    3:on    4:on    5:on    6:off
npcd: npcd              0:off   1:off   2:off   3:on    4:off   5:on    6:off

rrdcached
on boot

Code: Select all

Feb 19 16:23:25 Nagios abrtd: Directory 'ccpp-2013-02-19-16:23:25-2255' creation detected
Feb 19 16:23:25 Nagios abrtd: Executable '/usr/bin/rrdcached' doesn't belong to any package
Feb 19 16:23:25 Nagios abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-02-19-16:23:25-2255' exited with 1
Feb 19 16:23:25 Nagios abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-02-19-16:23:25-2255, deleting

on "service rrdcached restart"

Code: Select all

Feb 19 16:57:06 Nagios rrdcached[19201]: starting up
Feb 19 16:57:06 Nagios rrdcached[19201]: checking for journal files
Feb 19 16:57:06 Nagios rrdcached[19201]: journal processing complete
Feb 19 16:57:06 Nagios rrdcached[19201]: listening for connections

If I just manually restart rrdcached, ndo2db & nagios, they all start OK. Maybe it's some sort of start-order issue? Any clues welcome.

scottwilkerson · Post by **scottwilkerson** » Wed Feb 20, 2013 11:30 am

That is quite the performance increase!

Here's what we have by default, it is slightly different

Code: Select all

nagios: nagios          0:off   1:off   2:off   3:on    4:off   5:on    6:off
ndo2db: ndo2db          0:off   1:off   2:off   3:on    4:off   5:on    6:off
mysqld: mysqld          0:off   1:off   2:off   3:on    4:off   5:on    6:off
postgresql: postgresql          0:off   1:off   2:off   3:on    4:off   5:on    6:off
rrdcached: rrdcached            0:off   1:off   2:off   3:on    4:off   5:on    6:off
npcd: npcd              0:off   1:off   2:off   3:on    4:off   5:on    6:off

chrisp · Post by **chrisp** » Wed Feb 20, 2013 11:43 am

The performance increase is even more impressive when you know that the old host has "interval_length=180" in nagios.conf, in order to cope at all.

I did "chkconfig <service> on", for postgresql, ndo2db & nagios, just in case that was an issue, so that explains the difference I think.

abrist · Post by **abrist** » Wed Feb 20, 2013 12:49 pm

If you make the rc changes, do you still experience the race conditions?

chrisp · Post by **chrisp** » Wed Feb 20, 2013 2:18 pm

Yes.

I did: -

Code: Select all

for SERVICE in nagios ndo2db postgresql rrdcached mysqld npcd ; do echo -n "${SERVICE}: " ; chkconfig ${SERVICE} off ; done

then: -

Code: Select all

for SERVICE in nagios ndo2db postgresql rrdcached mysqld npcd ; do echo -n "${SERVICE}: " ; chkconfig --levels 35 ${SERVICE} on ; done

After reboot: -

Code: Select all

# for SERVICE in nagios ndo2db postgresql rrdcached mysqld npcd ; do echo -n "${SERVICE}: " ; service ${SERVICE} status ; done      
nagios: No lock file found in /usr/local/nagios/var/nagios.lock
ndo2db: ndo2db is not running but subsystem locked
postgresql: postmaster (pid  2098) is running...
rrdcached: rrdcached is stopped
mysqld: mysqld (pid  2061) is running...
npcd: NPCD running (pid 2272).

abrist · Post by **abrist** » Wed Feb 20, 2013 2:25 pm

Hmmm. If we cannot get these conditions resolved, you may have to scrape together a custom init script to fire things up in the proper order.

chrisp · Post by **chrisp** » Wed Feb 20, 2013 4:33 pm

You're right, I need to get this implementation live sooner rather than later!

I knocked up a script to kick the 3 problem services after a guestimated safe time, run from /etc/rc.local

Code: Select all

#!/bin/sh
#weirdNagiosStartupProblemFix

sleep 15s

for SERVICE in rrdcached ndo2db nagios
do
    /sbin/service ${SERVICE} restart
done

A reboot later and the system is up and running unattended and without manual intervention.

Nagios Support Forum

Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors

Re: Performance Issues / fork() errors