Page 5 of 5

Re: Performance Issues / fork() errors

Posted: Wed Feb 20, 2013 5:23 pm
by chrisp
I spoke too soon.

Re: Performance Issues / fork() errors

Posted: Wed Feb 20, 2013 5:25 pm
by scottwilkerson
chrisp wrote:I spoke too soon.
That didn't work?

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 5:14 am
by chrisp
Nope. I'm losing the will to live.

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 11:11 am
by abrist
Well, don't notarize the living will just yet. Maybe the sleep is not long enough, or the order is not correct. You could also have race conditions concerning those 3 services. Maybe sleep between each service start? Ideally you will not be rebooting often, but I understand the desire and requirement for production boxes to come back up unattended after a reboot.

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 11:19 am
by scottwilkerson
chrisp wrote:

Code: Select all

nagios: No lock file found in /usr/local/nagios/var/nagios.lock
ndo2db: ndo2db is not running but subsystem locked
Looking at this again, the first item looks like the nagios.lock file is not being created, the second item will only appear if the lock file exists but the process is not running... Can we look at the following permissions

Code: Select all

ls -l /usr/local/nagiosxi/var/subsys
ls -ld /usr/local/nagiosxi/var/subsys

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 11:39 am
by chrisp
@abrist: I toyed with varying sleeps, all the way up to 300s

Code: Select all

# ls -l /usr/local/nagiosxi/var/subsys
total 4
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 nagios
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 ndo2db
-rw-r--r-- 1 nagios nagios 4 Feb 20 22:21 npcd.pid

# ls -ld /usr/local/nagiosxi/var/subsys
drwxr-xr-x 2 nagios nagios 4096 Feb 20 22:22 /usr/local/nagiosxi/var/subsys

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 7:23 pm
by scottwilkerson
We are trying to replicate this but are not having any luck, is there any special configuration on this system?

Re: Performance Issues / fork() errors

Posted: Thu Feb 21, 2013 8:06 pm
by chrisp
RAMdisk is in play, but nothing exotic, unless Gavin says otherwise...

This is sorting it out while we figure out stuff: -

Code: Select all

# crontab -l

## Restart any errant programs
* * * * * /home/admin/bin/stuffstarter 2>&1 >/dev/null

Code: Select all

# cat /home/admin/bin/stuffstarter

#!/bin/sh
## stuffstarter - cuz sometimes stuff just doesn't start...

EMAIL="chrisp"
HOSTNAME=`hostname`
TIME=`date`
SERVICES="rrdcached ndo2db nagios"

for SERVICE in ${SERVICES}
do
    /sbin/service ${SERVICE} status 2>&1 | grep -q 'is running'

    if [ "$?" == "0" ]
    then
        echo "${SERVICE}: OK"
    else
        echo "${SERVICE}: Problem"
        (
            echo ""
            echo "${HOSTNAME} @ ${TIME}"
            echo ""
            echo "$0 is restarting ${SERVICE}"
            echo ""
            /sbin/service ${SERVICE} status
            echo ""
            /sbin/service ${SERVICE} restart
            echo ""
        ) | /bin/mail -s "Restarting ${SERVICE}" ${EMAIL}
    fi
done

exit 0

Re: Performance Issues / fork() errors

Posted: Fri Feb 22, 2013 7:57 am
by scottwilkerson
We will continue to try to replicate this problem here and post back what we find.