Performance Issues / fork() errors

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

I spoke too soon.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Performance Issues / fork() errors

Post by scottwilkerson »

chrisp wrote:I spoke too soon.
That didn't work?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

Nope. I'm losing the will to live.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Performance Issues / fork() errors

Post by abrist »

Well, don't notarize the living will just yet. Maybe the sleep is not long enough, or the order is not correct. You could also have race conditions concerning those 3 services. Maybe sleep between each service start? Ideally you will not be rebooting often, but I understand the desire and requirement for production boxes to come back up unattended after a reboot.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Performance Issues / fork() errors

Post by scottwilkerson »

chrisp wrote:

Code: Select all

nagios: No lock file found in /usr/local/nagios/var/nagios.lock
ndo2db: ndo2db is not running but subsystem locked
Looking at this again, the first item looks like the nagios.lock file is not being created, the second item will only appear if the lock file exists but the process is not running... Can we look at the following permissions

Code: Select all

ls -l /usr/local/nagiosxi/var/subsys
ls -ld /usr/local/nagiosxi/var/subsys
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

@abrist: I toyed with varying sleeps, all the way up to 300s

Code: Select all

# ls -l /usr/local/nagiosxi/var/subsys
total 4
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 nagios
-rw-r--r-- 1 nagios nagios 0 Feb 20 22:22 ndo2db
-rw-r--r-- 1 nagios nagios 4 Feb 20 22:21 npcd.pid

# ls -ld /usr/local/nagiosxi/var/subsys
drwxr-xr-x 2 nagios nagios 4096 Feb 20 22:22 /usr/local/nagiosxi/var/subsys
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Performance Issues / fork() errors

Post by scottwilkerson »

We are trying to replicate this but are not having any luck, is there any special configuration on this system?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

RAMdisk is in play, but nothing exotic, unless Gavin says otherwise...

This is sorting it out while we figure out stuff: -

Code: Select all

# crontab -l

## Restart any errant programs
* * * * * /home/admin/bin/stuffstarter 2>&1 >/dev/null

Code: Select all

# cat /home/admin/bin/stuffstarter

#!/bin/sh
## stuffstarter - cuz sometimes stuff just doesn't start...

EMAIL="chrisp"
HOSTNAME=`hostname`
TIME=`date`
SERVICES="rrdcached ndo2db nagios"

for SERVICE in ${SERVICES}
do
    /sbin/service ${SERVICE} status 2>&1 | grep -q 'is running'

    if [ "$?" == "0" ]
    then
        echo "${SERVICE}: OK"
    else
        echo "${SERVICE}: Problem"
        (
            echo ""
            echo "${HOSTNAME} @ ${TIME}"
            echo ""
            echo "$0 is restarting ${SERVICE}"
            echo ""
            /sbin/service ${SERVICE} status
            echo ""
            /sbin/service ${SERVICE} restart
            echo ""
        ) | /bin/mail -s "Restarting ${SERVICE}" ${EMAIL}
    fi
done

exit 0
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Performance Issues / fork() errors

Post by scottwilkerson »

We will continue to try to replicate this problem here and post back what we find.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked